2025-05-23 |
REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders |
Savya Khosla et.al. |
2505.18153v1 |
null |
2025-05-23 |
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions |
Zizhang Li et.al. |
2505.18151v1 |
null |
2025-05-23 |
Generative Distribution Embeddings |
Nic Fishman et.al. |
2505.18150v1 |
null |
2025-05-23 |
First Finish Search: Efficient Test-Time Scaling in Large Language Models |
Aradhye Agarwal et.al. |
2505.18149v1 |
null |
2025-05-23 |
TokBench: Evaluating Your Visual Tokenizer before Visual Generation |
Junfeng Wu et.al. |
2505.18142v1 |
null |
2025-05-23 |
INN-FF: A Scalable and Efficient Machine Learning Potential for Molecular Dynamics |
Taskin Mehereen et.al. |
2505.18141v1 |
null |
2025-05-23 |
UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification |
Poojah Ganesan et.al. |
2505.18122v1 |
null |
2025-05-23 |
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning |
Huayu Chen et.al. |
2505.18116v1 |
null |
2025-05-23 |
Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion |
Jacob Hansen et.al. |
2505.18115v1 |
null |
2025-05-23 |
Accelerating Learned Image Compression Through Modeling Neural Training Dynamics |
Yichi Zhang et.al. |
2505.18107v1 |
null |
2025-05-23 |
F-ANcGAN: An Attention-Enhanced Cycle Consistent Generative Adversarial Architecture for Synthetic Image Generation of Nanoparticles |
Varun Ajith et.al. |
2505.18106v1 |
null |
2025-05-23 |
Structural Dynamics of Harmful Content Dissemination on WhatsApp |
Yuxin Liu et.al. |
2505.18099v1 |
null |
2025-05-23 |
Early-Exit Graph Neural Networks |
Andrea Giuseppe Di Francesco et.al. |
2505.18088v1 |
null |
2025-05-23 |
DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation |
Junhao Chen et.al. |
2505.18078v1 |
null |
2025-05-23 |
Image rotation in plasmas |
Renaud Gueroult et.al. |
2505.18062v1 |
null |
2025-05-23 |
Posted Pricing and Competition in Large Markets |
José Correa et.al. |
2505.18061v1 |
null |
2025-05-23 |
Semantic Correspondence: Unified Benchmarking and a Strong Baseline |
Kaiyan Zhang et.al. |
2505.18060v1 |
link |
2025-05-23 |
A Foundation Model Framework for Multi-View MRI Classification of Extramural Vascular Invasion and Mesorectal Fascia Invasion in Rectal Cancer |
Yumeng Zhang et.al. |
2505.18058v1 |
null |
2025-05-23 |
BOTM: Echocardiography Segmentation via Bi-directional Optimal Token Matching |
Zhihua Liu et.al. |
2505.18052v1 |
null |
2025-05-23 |
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision |
Anthony Fuller et.al. |
2505.18051v1 |
null |
2025-05-23 |
SpikeGen: Generative Framework for Visual Spike Stream Processing |
Gaole Dai et.al. |
2505.18049v1 |
null |
2025-05-23 |
SHARDeg: A Benchmark for Skeletal Human Action Recognition in Degraded Scenarios |
Simon Malzard et.al. |
2505.18048v1 |
null |
2025-05-23 |
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration |
Sudarshan Rajagopalan et.al. |
2505.18047v1 |
null |
2025-05-23 |
Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation |
Li Zhong et.al. |
2505.18039v1 |
null |
2025-05-23 |
CAMME: Adaptive Deepfake Image Detection with Multi-Modal Cross-Attention |
Naseem Khan et.al. |
2505.18035v1 |
null |
2025-05-23 |
Knot So Simple: A Minimalistic Environment for Spatial Reasoning |
Zizhao Chen et.al. |
2505.18028v1 |
null |
2025-05-23 |
A Wavelet-based Stereo Matching Framework for Solving Frequency Convergence Inconsistency |
Xiaobao Wei et.al. |
2505.18024v1 |
null |
2025-05-23 |
RemoteSAM: Towards Segment Anything for Earth Observation |
Liang Yao et.al. |
2505.18022v1 |
null |
2025-05-23 |
Building Floor Number Estimation from Crowdsourced Street-Level Images: Munich Dataset and Baseline Method |
Yao Sun et.al. |
2505.18021v1 |
null |
2025-05-23 |
SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification |
Shashank Agnihotri et.al. |
2505.18015v1 |
null |