Skip to content

OCR

OCR

Publish Date Title Authors PDF Code
2023-06-05 Transformer-Based UNet with Multi-Headed Cross-Attention Skip Connections to Eliminate Artifacts in Scanned Documents David Kreuzer et.al. 2306.02815v1 null
2023-06-03 TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal Domain Sagar Chakraborty et.al. 2306.02142v1 link
2023-06-02 DocFormerv2: Local Features for Document Understanding Srikar Appalaraju et.al. 2306.01733v1 null
2023-06-01 Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering Wenjin Wang et.al. 2306.00526v1 link
2023-05-31 Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model Haisong Ding et.al. 2305.19543v1 null
2023-05-30 DuoSearch: A Novel Search Engine for Bulgarian Historical Documents Angel Beshirov et.al. 2305.19392v1 link
2023-05-29 GlyphControl: Glyph Conditional Control for Visual Text Generation Yukang Yang et.al. 2305.18259v1 link
2023-05-28 FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions Noam Rotstein et.al. 2305.17718v1 link
2023-05-27 Exploring Better Text Image Translation with Multimodal Codebook Zhibin Lan et.al. 2305.17415v2 link
2023-05-27 Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers Valfride Nascimento et.al. 2305.17313v1 link
2023-05-26 People and Places of Historical Europe: Bootstrapping Annotation Pipeline and a New Corpus of Named Entities in Late Medieval Texts Vít Novotný et.al. 2305.16718v1 null
2023-05-24 Quantifying Character Similarity with Vision Transformers Xinmei Yang et.al. 2305.14672v1 link
2023-05-21 Measuring Intersectional Biases in Historical Documents Nadav Borenstein et.al. 2305.12376v1 link
2023-05-19 XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages Sebastian Ruder et.al. 2305.11938v2 link
2023-05-18 TextDiffuser: Diffusion Models as Text Painters Jingye Chen et.al. 2305.10855v2 link
2023-05-16 Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding Shuwei Feng et.al. 2305.10448v1 null
2023-05-16 Mobile User Interface Element Detection Via Adaptively Prompt Tuning Zhangxuan Gu et.al. 2305.09699v1 link
2023-05-13 On the Hidden Mystery of OCR in Large Multimodal Models Yuliang Liu et.al. 2305.07895v2 link
2023-05-12 Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution Jianfeng Kuang et.al. 2305.07498v1 link
2023-05-11 Combining OCR Models for Reading Early Modern Printed Books Mathias Seuret et.al. 2305.07131v1 link
2023-05-09 E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation Cong Ma et.al. 2305.05166v2 link
2023-05-04 Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation Renshen Wang et.al. 2305.02577v1 null
2023-05-03 Evaluating BERT-based Scientific Relation Classifiers for Scholarly Knowledge Graph Construction on Digital Library Collections Ming Jiang et.al. 2305.02291v1 null
2023-04-28 LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Peng Gao et.al. 2304.15010v1 link
2023-04-24 DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents Mohamed Dhouib et.al. 2304.12484v2 null
2023-04-24 ICDAR 2023 Competition on Reading the Seal Title Wenwen Yu et.al. 2304.11966v2 null
2023-04-17 Multimodal Short Video Rumor Detection System Based on Contrastive Learning Yuxing Yang et.al. 2304.08401v3 null
2023-04-15 TransDocs: Optical Character Recognition with word to word translation Abhishek Bamotra et.al. 2304.07637v1 link
2023-04-07 Linking Representations with Multimodal Contrastive Learning Abhishek Arora et.al. 2304.03464v2 null
2023-04-07 Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts Queenie Luo et.al. 2304.03427v1 null