OCR

Publish Date	Title	Authors	PDF	Code
2023-06-05	Transformer-Based UNet with Multi-Headed Cross-Attention Skip Connections to Eliminate Artifacts in Scanned Documents	David Kreuzer et.al.	2306.02815v1	null
2023-06-03	TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal Domain	Sagar Chakraborty et.al.	2306.02142v1	link
2023-06-02	DocFormerv2: Local Features for Document Understanding	Srikar Appalaraju et.al.	2306.01733v1	null
2023-06-01	Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering	Wenjin Wang et.al.	2306.00526v1	link
2023-05-31	Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model	Haisong Ding et.al.	2305.19543v1	null
2023-05-30	DuoSearch: A Novel Search Engine for Bulgarian Historical Documents	Angel Beshirov et.al.	2305.19392v1	link
2023-05-29	GlyphControl: Glyph Conditional Control for Visual Text Generation	Yukang Yang et.al.	2305.18259v1	link
2023-05-28	FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions	Noam Rotstein et.al.	2305.17718v1	link
2023-05-27	Exploring Better Text Image Translation with Multimodal Codebook	Zhibin Lan et.al.	2305.17415v2	link
2023-05-27	Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers	Valfride Nascimento et.al.	2305.17313v1	link
2023-05-26	People and Places of Historical Europe: Bootstrapping Annotation Pipeline and a New Corpus of Named Entities in Late Medieval Texts	Vít Novotný et.al.	2305.16718v1	null
2023-05-24	Quantifying Character Similarity with Vision Transformers	Xinmei Yang et.al.	2305.14672v1	link
2023-05-21	Measuring Intersectional Biases in Historical Documents	Nadav Borenstein et.al.	2305.12376v1	link
2023-05-19	XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages	Sebastian Ruder et.al.	2305.11938v2	link
2023-05-18	TextDiffuser: Diffusion Models as Text Painters	Jingye Chen et.al.	2305.10855v2	link
2023-05-16	Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding	Shuwei Feng et.al.	2305.10448v1	null
2023-05-16	Mobile User Interface Element Detection Via Adaptively Prompt Tuning	Zhangxuan Gu et.al.	2305.09699v1	link
2023-05-13	On the Hidden Mystery of OCR in Large Multimodal Models	Yuliang Liu et.al.	2305.07895v2	link
2023-05-12	Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution	Jianfeng Kuang et.al.	2305.07498v1	link
2023-05-11	Combining OCR Models for Reading Early Modern Printed Books	Mathias Seuret et.al.	2305.07131v1	link
2023-05-09	E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation	Cong Ma et.al.	2305.05166v2	link
2023-05-04	Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation	Renshen Wang et.al.	2305.02577v1	null
2023-05-03	Evaluating BERT-based Scientific Relation Classifiers for Scholarly Knowledge Graph Construction on Digital Library Collections	Ming Jiang et.al.	2305.02291v1	null
2023-04-28	LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model	Peng Gao et.al.	2304.15010v1	link
2023-04-24	DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents	Mohamed Dhouib et.al.	2304.12484v2	null
2023-04-24	ICDAR 2023 Competition on Reading the Seal Title	Wenwen Yu et.al.	2304.11966v2	null
2023-04-17	Multimodal Short Video Rumor Detection System Based on Contrastive Learning	Yuxing Yang et.al.	2304.08401v3	null
2023-04-15	TransDocs: Optical Character Recognition with word to word translation	Abhishek Bamotra et.al.	2304.07637v1	link
2023-04-07	Linking Representations with Multimodal Contrastive Learning	Abhishek Arora et.al.	2304.03464v2	null
2023-04-07	Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts	Queenie Luo et.al.	2304.03427v1	null