2025-05-23 |
REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders |
Savya Khosla et.al. |
2505.18153v1 |
null |
2025-05-23 |
Stochastic agent-based Monte Carlo simulations for reaction-diffusion models, population dynamics, and epidemic spreading |
Mohamed Swailem et.al. |
2505.18145v1 |
null |
2025-05-23 |
TokBench: Evaluating Your Visual Tokenizer before Visual Generation |
Junfeng Wu et.al. |
2505.18142v1 |
null |
2025-05-23 |
Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems |
Gordon Dai et.al. |
2505.18139v1 |
null |
2025-05-23 |
VideoGameBench: Can Vision-Language Models complete popular video games? |
Alex L. Zhang et.al. |
2505.18134v1 |
null |
2025-05-23 |
One RL to See Them All: Visual Triple Unified Reinforcement Learning |
Yan Ma et.al. |
2505.18129v1 |
null |
2025-05-23 |
Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion |
Jacob Hansen et.al. |
2505.18115v1 |
null |
2025-05-23 |
Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking |
Cheng-Yen Yang et.al. |
2505.18111v1 |
null |
2025-05-23 |
Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM |
Zinuo Li et.al. |
2505.18110v1 |
null |
2025-05-23 |
CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays |
Hyungyung Lee et.al. |
2505.18087v1 |
null |
2025-05-23 |
DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation |
Junhao Chen et.al. |
2505.18078v1 |
null |
2025-05-23 |
Beyond flat-panel displays, applications of stereographic and holographic devices in 3D microscopy data analysis |
Yong Wan et.al. |
2505.18075v1 |
null |
2025-05-23 |
Towards Uncertainty Aware Task Delegation and Human-AI Collaborative Decision-Making |
Min Hun Lee et.al. |
2505.18066v1 |
null |
2025-05-23 |
Asymptotically optimal regret in communicating Markov decision processes |
Victor Boone et.al. |
2505.18064v1 |
null |
2025-05-23 |
Semantic Correspondence: Unified Benchmarking and a Strong Baseline |
Kaiyan Zhang et.al. |
2505.18060v1 |
link |
2025-05-23 |
A Foundation Model Framework for Multi-View MRI Classification of Extramural Vascular Invasion and Mesorectal Fascia Invasion in Rectal Cancer |
Yumeng Zhang et.al. |
2505.18058v1 |
null |
2025-05-23 |
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision |
Anthony Fuller et.al. |
2505.18051v1 |
null |
2025-05-23 |
SpikeGen: Generative Framework for Visual Spike Stream Processing |
Gaole Dai et.al. |
2505.18049v1 |
null |
2025-05-23 |
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration |
Sudarshan Rajagopalan et.al. |
2505.18047v1 |
null |
2025-05-23 |
Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions |
Yizhou Xu et.al. |
2505.18046v1 |
null |
2025-05-23 |
Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation |
Li Zhong et.al. |
2505.18039v1 |
null |
2025-05-23 |
Efficient Conditional Gradient Methods for Solving Stochastic Convex Bilevel Optimization Problems |
Khanh-Hung Giang-Tran et.al. |
2505.18037v1 |
null |
2025-05-23 |
CAMME: Adaptive Deepfake Image Detection with Multi-Modal Cross-Attention |
Naseem Khan et.al. |
2505.18035v1 |
null |
2025-05-23 |
Automata Learning of Preferences over Temporal Logic Formulas from Pairwise Comparisons |
Hazhar Rahmani et.al. |
2505.18030v1 |
null |
2025-05-23 |
A Wavelet-based Stereo Matching Framework for Solving Frequency Convergence Inconsistency |
Xiaobao Wei et.al. |
2505.18024v1 |
null |
2025-05-23 |
RemoteSAM: Towards Segment Anything for Earth Observation |
Liang Yao et.al. |
2505.18022v1 |
null |
2025-05-23 |
SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification |
Shashank Agnihotri et.al. |
2505.18015v1 |
null |
2025-05-23 |
DiFache: Efficient and Scalable Caching on Disaggregated Memory using Decentralized Coherence |
Hanze Zhang et.al. |
2505.18013v1 |
null |
2025-05-23 |
TRACE for Tracking the Emergence of Semantic Representations in Transformers |
Nura Aljaafari et.al. |
2505.17998v1 |
null |
2025-05-23 |
Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation |
Zhihua Liu et.al. |
2505.17994v1 |
null |