GitHub - zhangliang-04/mPLUG-DocOwl: mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

The Powerful Multi-modal LLM Family
for OCR-free Document Understanding

Alibaba Group

News

🔥🔥🔥 [2024.4.26] We release the arxiv paper of TinyChart, a SOTA 3B Multimodal LLM for Chart Understanding (ChartVQA: 83.6 > Gemin-Utra 80.8 > GPT4V 78.5). Both codes, models and data will be released in TinyChart.
🔥🔥🔥 [2024.4.3] We build demos of DocOwl1.5 on both ModelScope and HuggingFace 🤗, supported by the DocOwl1.5-Omni. The source codes of launching a local demo are also released in DocOwl1.5.
🔥🔥 [2024.3.28] We release the training data (DocStruct4M, DocDownstream-1.0, DocReason25K), codes and models (DocOwl1.5-stage1, DocOwl1.5, DocOwl1.5-Chat, DocOwl1.5-Omni) of mPLUG-DocOwl 1.5 on both HuggingFace 🤗 and ModelScope .
🔥 [2024.3.20] We release the arxiv paper of mPLUG-DocOwl 1.5, a SOTA 8B Multimodal LLM on OCR-free Document Understanding (DocVQA 82.2, InfoVQA 50.7, ChartQA 70.2, TextVQA 68.6).
[2024.01.13] Our Scientific Diagram Analysis dataset M-Paper has been available on both HuggingFace 🤗 and ModelScope , containing 447k high-resolution diagram images and corresponding paragraph analysis.
[2023.10.13] Training data, models of mPLUG-DocOwl/UReader has been open-soruced.
[2023.10.10] Our paper UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model is accepted by EMNLP 2023.

[2023.07.10] The demo of mPLUG-DocOwl on ModelScope is avaliable.
[2023.07.07] We release the technical report and evaluation set of mPLUG-DocOwl.

Models

mPLUG-DocOwl1.5 (Arxiv 2024) - mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
TinyChart (Arxiv 2024) - TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
mPLUG-PaperOwl (Arxiv 2023) - mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
UReader (EMNLP 2023) - UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
mPLUG-DocOwl (Arxiv 2023) - mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Online Demo

Note: The demo of HuggingFace is not as stable as ModelScope because the GPU in ZeroGPU Spaces of HuggingFace is dynamically assigned.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
DocOwl		DocOwl
DocOwl1.5		DocOwl1.5
PaperOwl		PaperOwl
TinyChart		TinyChart
UReader		UReader
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Powerful Multi-modal LLM Family
for OCR-free Document Understanding

Alibaba Group

News

Models

Online Demo

ModelScope

HuggingFace

Cases

Related Projects

About

Releases

Packages

Languages

License

zhangliang-04/mPLUG-DocOwl

Folders and files

Latest commit

History

Repository files navigation

The Powerful Multi-modal LLM Family for OCR-free Document Understanding

Alibaba Group

News

Models

Online Demo

ModelScope

HuggingFace

Cases

Related Projects

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

The Powerful Multi-modal LLM Family
for OCR-free Document Understanding

Packages