2025-05-23 |
REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders |
Savya Khosla et.al. |
2505.18153v1 |
null |
2025-05-23 |
Boosting Open Set Recognition Performance through Modulated Representation Learning |
Amit Kumar Kundu et.al. |
2505.18137v1 |
null |
2025-05-23 |
Frankentext: Stitching random text fragments into long-form narratives |
Chau Minh Pham et.al. |
2505.18128v1 |
null |
2025-05-23 |
From Temporal to Spatial: Designing Spatialized Interactions with Segmented-audios in Immersive Environments for Active Engagement with Performing Arts Intangible Cultural Heritage |
Yuqi Wang et.al. |
2505.18112v1 |
null |
2025-05-23 |
Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking |
Cheng-Yen Yang et.al. |
2505.18111v1 |
null |
2025-05-23 |
F-ANcGAN: An Attention-Enhanced Cycle Consistent Generative Adversarial Architecture for Synthetic Image Generation of Nanoparticles |
Varun Ajith et.al. |
2505.18106v1 |
null |
2025-05-23 |
CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays |
Hyungyung Lee et.al. |
2505.18087v1 |
null |
2025-05-23 |
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding |
Xiaoyi Zhang et.al. |
2505.18079v1 |
null |
2025-05-23 |
BOTM: Echocardiography Segmentation via Bi-directional Optimal Token Matching |
Zhihua Liu et.al. |
2505.18052v1 |
null |
2025-05-23 |
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision |
Anthony Fuller et.al. |
2505.18051v1 |
null |
2025-05-23 |
RemoteSAM: Towards Segment Anything for Earth Observation |
Liang Yao et.al. |
2505.18022v1 |
null |
2025-05-23 |
SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification |
Shashank Agnihotri et.al. |
2505.18015v1 |
null |
2025-05-23 |
Classification of assembly tasks combining multiple primitive actions using Transformers and xLSTMs |
Miguel Neves et.al. |
2505.18012v1 |
null |
2025-05-23 |
Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation |
Zhihua Liu et.al. |
2505.17994v1 |
null |
2025-05-23 |
Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling |
Bryan Wong et.al. |
2505.17982v1 |
null |
2025-05-23 |
MR-EEGWaveNet: Multiresolutional EEGWaveNet for Seizure Detection from Long EEG Recordings |
Kazi Mahmudul Hassan et.al. |
2505.17972v1 |
null |
2025-05-23 |
Explainable Anatomy-Guided AI for Prostate MRI: Foundation Models and In Silico Clinical Trials for Virtual Biopsy-based Risk Assessment |
Danial Khan et.al. |
2505.17971v1 |
null |
2025-05-23 |
Optimizing QAOA circuit transpilation with parity twine and SWAP network encodings |
J. A. Montanez-Barrera et.al. |
2505.17944v1 |
null |
2025-05-23 |
AutoMiSeg: Automatic Medical Image Segmentation via Test-Time Adaptation of Foundation Models |
Xingjian Li et.al. |
2505.17931v1 |
null |
2025-05-23 |
Promptable cancer segmentation using minimal expert-curated data |
Lynn Karam et.al. |
2505.17915v1 |
null |
2025-05-23 |
Semantic segmentation with reward |
Xie Ting et.al. |
2505.17905v1 |
null |
2025-05-23 |
DataRater: Meta-Learned Dataset Curation |
Dan A. Calian et.al. |
2505.17895v1 |
null |
2025-05-23 |
Track Anything Annotate: Video annotation and dataset generation of computer vision models |
Nikita Ivanov et.al. |
2505.17884v1 |
null |
2025-05-23 |
DesignX: Human-Competitive Algorithm Designer for Black-Box Optimization |
Hongshu Guo et.al. |
2505.17866v1 |
null |
2025-05-23 |
Generative Data Augmentation for Object Point Cloud Segmentation |
Dekai Zhu et.al. |
2505.17783v1 |
null |
2025-05-23 |
Hephaestus Minicubes: A Global, Multi-Modal Dataset for Volcanic Unrest Monitoring |
Nikolas Papadopoulos et.al. |
2505.17782v1 |
null |
2025-05-23 |
But what is your honest answer? Aiding LLM-judges with honest alternatives using steering vectors |
Leon Eshuijs et.al. |
2505.17760v1 |
null |
2025-05-23 |
MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box Optimization |
Zeyuan Ma et.al. |
2505.17745v1 |
null |
2025-05-23 |
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM |
Donghwan Chi et.al. |
2505.17726v1 |
null |
2025-05-23 |
SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation |
Dekai Zhu et.al. |
2505.17721v1 |
null |