
Tencent Open-Sources HunyuanOCR, A Lightweight Commercial-Grade Vision Model
Want to read in a language you're more familiar with?
SHENZHEN – Tencent’s Hunyuan large model team has officially released and open-sourced HunyuanOCR, a specialized lightweight vision-language model for optical c...
SHENZHEN – Tencent’s Hunyuan large model team has officially released and open-sourced HunyuanOCR, a specialized lightweight vision-language model for optical character recognition (OCR) containing just 1 billion parameters.
The model combines a native Vision Transformer (ViT) architecture with a lightweight large language model (LLM), delivering commercial-level performance in text detection, document parsing, and information extraction. It recently won first place in the small model track of the ICDAR 2025 DIMT challenge and achieved state-of-the-art results on the OCRBench benchmark for models under 3B parameters.
HunyuanOCR introduces three key breakthroughs:
- Unified multitasking capability – supporting text detection, complex layout analysis, open-field information extraction, and image translation within a single efficient framework
- End-to-end architecture – eliminating traditional preprocessing pipelines and reducing error accumulation
- Reinforcement learning optimization – demonstrating that RL can significantly enhance performance across multiple OCR tasks
The model has gained rapid community traction, ranking among the top four trending models on Hugging Face and receiving over 700 stars on GitHub within a short period. It has also been integrated into the vLLM inference engine.
Available now on Hugging Face and ModelScope, HunyuanOCR provides researchers and developers with a powerful, deployable OCR solution that balances high accuracy with computational efficiency – particularly valuable for edge deployment and industrial applications.




