Skip to content

Vision Transformer

Vision Transformer

Publish Date Title Authors PDF Code
2025-06-20 VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning Zhangyang Qi et.al. 2506.17221v1 null
2025-06-20 Emergent Temporal Correspondences from Video Diffusion Transformers Jisu Nam et.al. 2506.17220v1 link
2025-06-20 Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Zeyuan Yang et.al. 2506.17218v1 null
2025-06-20 Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation Xiuyu Yang et.al. 2506.17213v1 null
2025-06-20 Part$^{2}$GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting Tianjiao Yu et.al. 2506.17212v1 null
2025-06-20 DreamCube: 3D Panorama Generation via Multi-plane Synchronization Yukun Huang et.al. 2506.17206v1 null
2025-06-20 UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation Teng Li et.al. 2506.17202v1 null
2025-06-20 Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition Jiaqi Li et.al. 2506.17201v1 null
2025-06-20 Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation Jianglong Ye et.al. 2506.17198v1 null
2025-06-20 Facial Landmark Visualization and Emotion Recognition Through Neural Networks Israel Juárez-Jiménez et.al. 2506.17191v1 null
2025-06-20 YASMOT: Yet another stereo image multi-object tracker Ketil Malde et.al. 2506.17186v1 null
2025-06-20 Fault Tolerance by Construction Benjamin Rodatz et.al. 2506.17181v1 null
2025-06-20 Deep generative models as the probability transformation functions Vitalii Bondar et.al. 2506.17171v1 null
2025-06-20 Scaling limits for sample autocovariance operators of Hilbert space-valued linear processes Marie-Christine Düker et.al. 2506.17168v1 null
2025-06-20 Proportional Sensitivity in Generative Adversarial Network (GAN)-Augmented Brain Tumor Classification Using Convolutional Neural Network Mahin Montasir Afif et.al. 2506.17165v1 null
2025-06-20 The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making Abinitha Gourabathina et.al. 2506.17163v1 null
2025-06-20 Walking Fingerprinting Using Wrist Accelerometry During Activities of Daily Living in NHANES Lily Koffman et.al. 2506.17160v1 null
2025-06-20 Co-Seg++: Mutual Prompt-Guided Collaborative Learning for Versatile Medical Segmentation Qing Xu et.al. 2506.17159v1 null
2025-06-20 Do We Need Large VLMs for Spotting Soccer Actions? Ritabrata Chakraborty et.al. 2506.17144v1 null
2025-06-20 MeDi: Metadata-Guided Diffusion Models for Mitigating Biases in Tumor Classification David Jacob Drexlin et.al. 2506.17140v1 null
2025-06-20 On the Theory of Conditional Feature Alignment for Unsupervised Domain-Adaptive Counting Zhuonan Liang et.al. 2506.17137v1 null
2025-06-20 Semi-Supervised Multi-Modal Medical Image Segmentation for Complex Situations Dongdong Meng et.al. 2506.17136v1 null
2025-06-20 Dynamic Watermark Generation for Digital Images using Perimeter Gated SPAD Imager PUFs Md Sakibur Sajal et.al. 2506.17134v1 null
2025-06-20 Robust Training with Data Augmentation for Medical Imaging Classification Josué Martínez-Martínez et.al. 2506.17133v1 null
2025-06-20 Reassessing Code Authorship Attribution in the Era of Language Models Atish Kumar Dipongkor et.al. 2506.17120v1 null
2025-06-20 RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking Teng Guo et.al. 2506.17119v1 null
2025-06-20 A Vision for Trustworthy, Fair, and Efficient Socio-Technical Control using Karma Economies Ezzat Elokda et.al. 2506.17115v1 null
2025-06-20 MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation Shoubin Yu et.al. 2506.17113v1 null
2025-06-20 Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping Teng Guo et.al. 2506.17110v1 null
2025-06-20 TransDreamerV3: Implanting Transformer In DreamerV3 Shruti Sadanand Dongare et.al. 2506.17103v1 null