2023-06-05 |
Transformer-Based UNet with Multi-Headed Cross-Attention Skip Connections to Eliminate Artifacts in Scanned Documents |
David Kreuzer et.al. |
2306.02815v1 |
null |
2023-06-03 |
TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal Domain |
Sagar Chakraborty et.al. |
2306.02142v1 |
link |
2023-06-02 |
DocFormerv2: Local Features for Document Understanding |
Srikar Appalaraju et.al. |
2306.01733v1 |
null |
2023-06-01 |
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering |
Wenjin Wang et.al. |
2306.00526v1 |
link |
2023-05-31 |
Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model |
Haisong Ding et.al. |
2305.19543v1 |
null |
2023-05-30 |
DuoSearch: A Novel Search Engine for Bulgarian Historical Documents |
Angel Beshirov et.al. |
2305.19392v1 |
link |
2023-05-29 |
GlyphControl: Glyph Conditional Control for Visual Text Generation |
Yukang Yang et.al. |
2305.18259v1 |
link |
2023-05-28 |
FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions |
Noam Rotstein et.al. |
2305.17718v1 |
link |
2023-05-27 |
Exploring Better Text Image Translation with Multimodal Codebook |
Zhibin Lan et.al. |
2305.17415v2 |
link |
2023-05-27 |
Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers |
Valfride Nascimento et.al. |
2305.17313v1 |
link |
2023-05-26 |
People and Places of Historical Europe: Bootstrapping Annotation Pipeline and a New Corpus of Named Entities in Late Medieval Texts |
Vít Novotný et.al. |
2305.16718v1 |
null |
2023-05-24 |
Quantifying Character Similarity with Vision Transformers |
Xinmei Yang et.al. |
2305.14672v1 |
link |
2023-05-21 |
Measuring Intersectional Biases in Historical Documents |
Nadav Borenstein et.al. |
2305.12376v1 |
link |
2023-05-19 |
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages |
Sebastian Ruder et.al. |
2305.11938v2 |
link |
2023-05-18 |
TextDiffuser: Diffusion Models as Text Painters |
Jingye Chen et.al. |
2305.10855v2 |
link |
2023-05-16 |
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding |
Shuwei Feng et.al. |
2305.10448v1 |
null |
2023-05-16 |
Mobile User Interface Element Detection Via Adaptively Prompt Tuning |
Zhangxuan Gu et.al. |
2305.09699v1 |
link |
2023-05-13 |
On the Hidden Mystery of OCR in Large Multimodal Models |
Yuliang Liu et.al. |
2305.07895v2 |
link |
2023-05-12 |
Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution |
Jianfeng Kuang et.al. |
2305.07498v1 |
link |
2023-05-11 |
Combining OCR Models for Reading Early Modern Printed Books |
Mathias Seuret et.al. |
2305.07131v1 |
link |
2023-05-09 |
E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation |
Cong Ma et.al. |
2305.05166v2 |
link |
2023-05-04 |
Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation |
Renshen Wang et.al. |
2305.02577v1 |
null |
2023-05-03 |
Evaluating BERT-based Scientific Relation Classifiers for Scholarly Knowledge Graph Construction on Digital Library Collections |
Ming Jiang et.al. |
2305.02291v1 |
null |
2023-04-28 |
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model |
Peng Gao et.al. |
2304.15010v1 |
link |
2023-04-24 |
DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents |
Mohamed Dhouib et.al. |
2304.12484v2 |
null |
2023-04-24 |
ICDAR 2023 Competition on Reading the Seal Title |
Wenwen Yu et.al. |
2304.11966v2 |
null |
2023-04-17 |
Multimodal Short Video Rumor Detection System Based on Contrastive Learning |
Yuxing Yang et.al. |
2304.08401v3 |
null |
2023-04-15 |
TransDocs: Optical Character Recognition with word to word translation |
Abhishek Bamotra et.al. |
2304.07637v1 |
link |
2023-04-07 |
Linking Representations with Multimodal Contrastive Learning |
Abhishek Arora et.al. |
2304.03464v2 |
null |
2023-04-07 |
Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts |
Queenie Luo et.al. |
2304.03427v1 |
null |