chart ahmed-masry/ChartQA Viewer • Updated Jun 22, 2024 • 32.7k • 1.73k • 31 oroikon/chart_captioning Viewer • Updated Oct 8, 2023 • 8.82k • 80 • 12 heegyu/chart2text_statista Viewer • Updated Oct 12, 2023 • 34.8k • 1.61k • 9 nourheshamshaheen/typed_final_chart_to_table Viewer • Updated Nov 12, 2023 • 2.81k • 51 • 6
Document Undestanding Models Mizukiluke/ureader-instruction-1.0 Viewer • Updated Oct 13, 2023 • 24.5k • 70 • 15
Captioning docling-project/USPTO-30K Viewer • Updated Aug 24, 2023 • 30k • 346 • 10 MMInstruction/ArxivCap Viewer • Updated Oct 3, 2024 • 573k • 44.3k • 58 mPLUG/M-Paper Preview • Updated Jan 13, 2024 • 499 • 13
DocQA jp1924/DocStruct4M Viewer • Updated Feb 5, 2025 • 3.05M • 48 • 4 howard-hou/OCR-VQA Viewer • Updated Apr 24, 2023 • 208k • 2.27k • 60 vikhyatk/docmatix-single Viewer • Updated Jul 19, 2024 • 565k • 164 • 6 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 170 • 38
Page to MD A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR v1v1d/Arxiv_MD_v2_2k Viewer • Updated Jun 24, 2024 • 3.04k • 16 v1v1d/Arxiv_MD_v2 Viewer • Updated Jun 24, 2024 • 14.2k • 56 v1v1d/Arxiv_MD_v1_1k Viewer • Updated Jun 23, 2024 • 1.14k • 14 v1v1d/Arxiv_MD_v1 Viewer • Updated Jun 18, 2024 • 9.96k • 45
OCR wendlerc/RenderedText Viewer • Updated Oct 23, 2025 • 12M • 9.76k • 58 Salesforce/blip3-ocr-200m Viewer • Updated Feb 3, 2025 • 96M • 1.67k • 44 openpecha/OCR-Google_Books Viewer • Updated Oct 20, 2025 • 751k • 276 openpecha/OCR-Norbuketaka Viewer • Updated Oct 14, 2025 • 2.24M • 27
Table Extraction docling-project/PubTables-1M_OTSL Viewer • Updated Aug 31, 2023 • 1.88M • 2.22k • 7 docling-project/PubTabNet_OTSL Viewer • Updated Aug 31, 2023 • 395k • 2.49k • 5 docling-project/FinTabNet_OTSL Viewer • Updated Aug 31, 2023 • 109k • 630 • 7
Layout Detection docling-project/DocLayNet-v1.1 Viewer • Updated Sep 1, 2023 • 63.5k • 2.04k • 27 docling-project/DocLayNet Updated Jan 25, 2023 • 666 • 140 vikp/doclaynet_processed Viewer • Updated Nov 30, 2023 • 80.9k • 702 • 6 psyche/publaynet Viewer • Updated Jul 30, 2024 • 347k • 135
VQA wyu1/Leopard-Instruct Viewer • Updated Nov 8, 2024 • 1.03M • 65.2k • 65 neulab/PangeaInstruct Updated Feb 2, 2025 • 315 • 86 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 170 • 38 vidore/arxivqa_train Viewer • Updated Jun 20, 2025 • 95k • 360
Latex Extract A dataset collection of image-text pairs, where each image contains mathematical formulas, and each corresponding text provides the relevant LaTeX v1v1d/Latexify_v1_clean Viewer • Updated Jul 29, 2024 • 11k • 33 • 1 v1v1d/Latexify_v1 Viewer • Updated Jul 29, 2024 • 234k • 19 • 1 OleehyO/latex-formulas Viewer • Updated Aug 13, 2025 • 1.56M • 564 • 101 unsloth/LaTeX_OCR Viewer • Updated Nov 21, 2024 • 76.3k • 4.37k • 82
chart ahmed-masry/ChartQA Viewer • Updated Jun 22, 2024 • 32.7k • 1.73k • 31 oroikon/chart_captioning Viewer • Updated Oct 8, 2023 • 8.82k • 80 • 12 heegyu/chart2text_statista Viewer • Updated Oct 12, 2023 • 34.8k • 1.61k • 9 nourheshamshaheen/typed_final_chart_to_table Viewer • Updated Nov 12, 2023 • 2.81k • 51 • 6
OCR wendlerc/RenderedText Viewer • Updated Oct 23, 2025 • 12M • 9.76k • 58 Salesforce/blip3-ocr-200m Viewer • Updated Feb 3, 2025 • 96M • 1.67k • 44 openpecha/OCR-Google_Books Viewer • Updated Oct 20, 2025 • 751k • 276 openpecha/OCR-Norbuketaka Viewer • Updated Oct 14, 2025 • 2.24M • 27
Document Undestanding Models Mizukiluke/ureader-instruction-1.0 Viewer • Updated Oct 13, 2023 • 24.5k • 70 • 15
Table Extraction docling-project/PubTables-1M_OTSL Viewer • Updated Aug 31, 2023 • 1.88M • 2.22k • 7 docling-project/PubTabNet_OTSL Viewer • Updated Aug 31, 2023 • 395k • 2.49k • 5 docling-project/FinTabNet_OTSL Viewer • Updated Aug 31, 2023 • 109k • 630 • 7
Captioning docling-project/USPTO-30K Viewer • Updated Aug 24, 2023 • 30k • 346 • 10 MMInstruction/ArxivCap Viewer • Updated Oct 3, 2024 • 573k • 44.3k • 58 mPLUG/M-Paper Preview • Updated Jan 13, 2024 • 499 • 13
Layout Detection docling-project/DocLayNet-v1.1 Viewer • Updated Sep 1, 2023 • 63.5k • 2.04k • 27 docling-project/DocLayNet Updated Jan 25, 2023 • 666 • 140 vikp/doclaynet_processed Viewer • Updated Nov 30, 2023 • 80.9k • 702 • 6 psyche/publaynet Viewer • Updated Jul 30, 2024 • 347k • 135
DocQA jp1924/DocStruct4M Viewer • Updated Feb 5, 2025 • 3.05M • 48 • 4 howard-hou/OCR-VQA Viewer • Updated Apr 24, 2023 • 208k • 2.27k • 60 vikhyatk/docmatix-single Viewer • Updated Jul 19, 2024 • 565k • 164 • 6 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 170 • 38
VQA wyu1/Leopard-Instruct Viewer • Updated Nov 8, 2024 • 1.03M • 65.2k • 65 neulab/PangeaInstruct Updated Feb 2, 2025 • 315 • 86 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 170 • 38 vidore/arxivqa_train Viewer • Updated Jun 20, 2025 • 95k • 360
Page to MD A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR v1v1d/Arxiv_MD_v2_2k Viewer • Updated Jun 24, 2024 • 3.04k • 16 v1v1d/Arxiv_MD_v2 Viewer • Updated Jun 24, 2024 • 14.2k • 56 v1v1d/Arxiv_MD_v1_1k Viewer • Updated Jun 23, 2024 • 1.14k • 14 v1v1d/Arxiv_MD_v1 Viewer • Updated Jun 18, 2024 • 9.96k • 45
Latex Extract A dataset collection of image-text pairs, where each image contains mathematical formulas, and each corresponding text provides the relevant LaTeX v1v1d/Latexify_v1_clean Viewer • Updated Jul 29, 2024 • 11k • 33 • 1 v1v1d/Latexify_v1 Viewer • Updated Jul 29, 2024 • 234k • 19 • 1 OleehyO/latex-formulas Viewer • Updated Aug 13, 2025 • 1.56M • 564 • 101 unsloth/LaTeX_OCR Viewer • Updated Nov 21, 2024 • 76.3k • 4.37k • 82