DeepSeek OCR - AI-Powered Text Extraction
The world's first online OCR tool powered by DeepSeek's vision-language model. 97% accuracy with ultra-low token consumption. Convert documents to Markdown, extract text from images, and parse complex layouts effortlessly.
Experience DeepSeek OCR Live
Upload your images and see how DeepSeek OCR performs in real-time
💡 Tip: This demo is powered by Hugging Face Spaces. Try uploading different types of images to see the OCR capabilities.
Industry-Leading OCR Performance
DeepSeek OCR delivers exceptional accuracy and efficiency through cutting-edge vision-language technology
Accuracy
97%
Text extraction accuracy with ~600-1000+ token recovery
Token Efficiency
100
Tokens per page (vs GOT-OCR2.0's 256 tokens)
Processing Speed
200K+
Pages per day on single A100-40G GPU

Revolutionary Vision-as-Compression Technology
DeepSeek OCR pioneers the use of vision as a long-context compression medium, achieving 10× lossless and 20× usable compression ratios
- Vision-as-CompressionFirst systematic proof that vision modality can serve as text compression medium - recover 600-1000+ text tokens from just 64-100 vision tokens
- Custom Vision EncoderDeepEncoder combines window + global attention with 16× compression structure, optimized for optical compression rather than visual understanding
- Production-ReadyNot just research - a plug-and-play production model with built-in multilingual support, chart parsing, and formula recognition
How to Use DeepSeek OCR
Three ways to leverage DeepSeek OCR - choose what works best for your workflow
Online Tool (Coming Soon)
Upload your image or PDF, get instant OCR results. No installation required. Free tier: 10 conversions/day.
Python API (Transformers)
Install via pip, load the model, and call infer() method. Perfect for simple scripts and prototyping. Supports CUDA for acceleration.
vLLM Batch Processing
High-performance batch processing with ~2500 tokens/s on A100-40G. Ideal for production workloads and large-scale document processing.
Self-Hosted Deployment
Deploy on your own infrastructure for maximum privacy and control. Supports Docker, Kubernetes, and cloud platforms.
Why Choose DeepSeek OCR?
Built on cutting-edge research with practical benefits for real-world use cases



Comprehensive OCR Capabilities
From simple text extraction to complex document parsing - DeepSeek OCR handles it all
Document to Markdown
Convert documents to structured Markdown with preserved layouts, tables, and formatting. Perfect for content migration and documentation.
Multi-Language Support
Built-in support for multiple languages with high accuracy. Process documents in English, Chinese, Japanese, and more.
Chart & Figure Parsing
Extract data from charts, diagrams, and figures. Understand visual elements beyond just text extraction.
Formula Recognition
Parse mathematical formulas, chemical equations, and geometric notations. Ideal for academic and scientific documents.
Multiple Resolution Modes
Tiny (64 tokens), Small (100 tokens), Base (256 tokens), Large (400 tokens), and Gundam mode for complex documents.
API & CLI Support
Integrate via Python API, use vLLM for high-performance batch processing, or try our online tool for quick tasks.
Real-World Use Cases
DeepSeek OCR excels at processing complex documents where traditional OCR fails

Academic Research Papers
Extract full text, mathematical formulas (LaTeX), chemical equations, and figure captions from research papers. Ideal for literature review and knowledge management. Example: Process 100-page PhD thesis in ~2 minutes on A100-40G, with ~95% formula recognition accuracy.

Technical Documentation
Convert technical manuals, API documentation, and code-heavy documents to structured Markdown. Preserves table structures, code blocks, and hierarchical headings. Perfect for migrating legacy documentation to modern formats or building searchable knowledge bases.

Multilingual Business Documents
Process international contracts, invoices, and reports with mixed English-Chinese-Japanese text. No manual language switching required. The vision-language model understands context across languages, maintaining accuracy even when terms are mixed (e.g., technical terms in English within Chinese documents).
Simple, Transparent Pricing
Start free, upgrade when you need more. No hidden costs.
Free Tier
Perfect for trying out DeepSeek OCR and small projects
- 10 conversions per day
- All resolution modes (Tiny to Large)
- Basic OCR + Document to Markdown
- Community support via GitHub
Pro Plan
For professionals and teams with higher volume needs
- Unlimited conversions
- Gundam mode for complex documents
- API access with higher rate limits
- Priority support
- Advanced features (batch processing, webhooks)
Frequently Asked Questions
Everything you need to know about DeepSeek OCR - based on official documentation and real-world integration experience
How does DeepSeek OCR compare to Tesseract and PaddleOCR?
DeepSeek OCR uses a vision-language model (VLM) for context-aware OCR, while Tesseract and PaddleOCR are traditional pattern-matching engines. Key differences: (1) Accuracy: DeepSeek excels at complex layouts (tables, formulas, mixed languages) with 97% accuracy vs Tesseract's ~85% on complex documents. (2) Token efficiency: 100 tokens/page vs PaddleOCR's higher processing overhead. (3) Hardware: Requires GPU (8GB+ VRAM) vs CPU-only for Tesseract. (4) Context understanding: Can correct OCR errors using surrounding text context. From my Feishu experience integrating DeepSeek models, VLM-based OCR is worth the GPU investment for production document processing.
What's the difference between resolution modes (Tiny, Small, Base, Large, Gundam)?
Resolution modes balance token consumption vs accuracy: Tiny (512×512, 64 tokens) - simple receipts/notes with clear text; Small (640×640, 100 tokens) - standard documents, recommended for most use cases; Base (1024×1024, 256 tokens) - complex layouts with tables/charts; Large (1280×1280, 400 tokens) - high-resolution scanned documents; Gundam (dynamic n×640×640 + 1×1024×1024) - academic papers with dense formulas and figures. Pro tip: Start with Small mode and upgrade only when accuracy drops below requirements. This saves significant API costs without sacrificing quality.
Is DeepSeek OCR really free and open source?
Yes, 100% open source! The 3B parameter model is available on GitHub (https://github.com/deepseek-ai/DeepSeek-OCR) and Hugging Face under a permissive license. You can: (1) Self-host on your infrastructure (no API costs), (2) Modify the model for your specific needs, (3) Use commercially without licensing fees. This website's online tool offers a free tier (10 conversions/day) for quick tasks. For production use, consider self-hosting with vLLM for maximum cost efficiency (~$0.001 per page on cloud GPUs vs ~$0.01-0.05 for commercial OCR APIs).
What are the hardware requirements for self-hosting?
GPU requirements: Minimum: 8GB VRAM (RTX 3070, RTX 4060 Ti) for basic inference at ~5-10 pages/min. Recommended: 16GB+ VRAM (RTX 4090, A100-40G) for production at ~100-200 pages/min. Enterprise: Multi-GPU setup (2-4× A100) for 200K+ pages/day. Software: CUDA 11.8+, PyTorch 2.6.0, vLLM 0.8.5+ for optimal throughput. CPU inference is possible but 50-100× slower (not recommended). Cloud options: AWS (p3/p4 instances), GCP (A100 VMs), Azure (NCv3 series). From practical experience, a single RTX 4090 handles most small-to-medium workloads cost-effectively.
Ready to Experience Next-Gen OCR?
Start converting documents with DeepSeek OCR today. Free tier available - no credit card required.