New🚀 New: Based on DeepSeek OCR 3B Model - Open Source!

DeepSeek OCR - AI-Powered Text Extraction

The world's first online OCR tool powered by DeepSeek's vision-language model. 97% accuracy with ultra-low token consumption. Convert documents to Markdown, extract text from images, and parse complex layouts effortlessly.

🚀Try It Now

Experience DeepSeek OCR Live

Upload your images and see how DeepSeek OCR performs in real-time

💡 Tip: This demo is powered by Hugging Face Spaces. Try uploading different types of images to see the OCR capabilities.

Performance

Industry-Leading OCR Performance

DeepSeek OCR delivers exceptional accuracy and efficiency through cutting-edge vision-language technology

Accuracy

97%

Text extraction accuracy with ~600-1000+ token recovery

Token Efficiency

100

Tokens per page (vs GOT-OCR2.0's 256 tokens)

Processing Speed

200K+

Pages per day on single A100-40G GPU

Revolutionary Vision-as-Compression Technology

DeepSeek OCR pioneers the use of vision as a long-context compression medium, achieving 10× lossless and 20× usable compression ratios

  • Vision-as-Compression
    First systematic proof that vision modality can serve as text compression medium - recover 600-1000+ text tokens from just 64-100 vision tokens
  • Custom Vision Encoder
    DeepEncoder combines window + global attention with 16× compression structure, optimized for optical compression rather than visual understanding
  • Production-Ready
    Not just research - a plug-and-play production model with built-in multilingual support, chart parsing, and formula recognition
Getting Started

How to Use DeepSeek OCR

Three ways to leverage DeepSeek OCR - choose what works best for your workflow

1

Online Tool (Coming Soon)

Upload your image or PDF, get instant OCR results. No installation required. Free tier: 10 conversions/day.

2

Python API (Transformers)

Install via pip, load the model, and call infer() method. Perfect for simple scripts and prototyping. Supports CUDA for acceleration.

3

vLLM Batch Processing

High-performance batch processing with ~2500 tokens/s on A100-40G. Ideal for production workloads and large-scale document processing.

4

Self-Hosted Deployment

Deploy on your own infrastructure for maximum privacy and control. Supports Docker, Kubernetes, and cloud platforms.

Advantages

Why Choose DeepSeek OCR?

Built on cutting-edge research with practical benefits for real-world use cases

100 tokens/page vs 256 tokens (GOT-OCR2.0) - save up to 60% on API costs while maintaining SOTA accuracy. Ideal for high-volume document processing.

Comprehensive OCR Capabilities

From simple text extraction to complex document parsing - DeepSeek OCR handles it all

Document to Markdown

Convert documents to structured Markdown with preserved layouts, tables, and formatting. Perfect for content migration and documentation.

Multi-Language Support

Built-in support for multiple languages with high accuracy. Process documents in English, Chinese, Japanese, and more.

Chart & Figure Parsing

Extract data from charts, diagrams, and figures. Understand visual elements beyond just text extraction.

Formula Recognition

Parse mathematical formulas, chemical equations, and geometric notations. Ideal for academic and scientific documents.

Multiple Resolution Modes

Tiny (64 tokens), Small (100 tokens), Base (256 tokens), Large (400 tokens), and Gundam mode for complex documents.

API & CLI Support

Integrate via Python API, use vLLM for high-performance batch processing, or try our online tool for quick tasks.

Simple, Transparent Pricing

Start free, upgrade when you need more. No hidden costs.

Free Tier

0forever

Perfect for trying out DeepSeek OCR and small projects

  • 10 conversions per day
  • All resolution modes (Tiny to Large)
  • Basic OCR + Document to Markdown
  • Community support via GitHub

Pro Plan

9.99per month

For professionals and teams with higher volume needs

  • Unlimited conversions
  • Gundam mode for complex documents
  • API access with higher rate limits
  • Priority support
  • Advanced features (batch processing, webhooks)
FAQ

Frequently Asked Questions

Everything you need to know about DeepSeek OCR - based on official documentation and real-world integration experience

1

How does DeepSeek OCR compare to Tesseract and PaddleOCR?

DeepSeek OCR uses a vision-language model (VLM) for context-aware OCR, while Tesseract and PaddleOCR are traditional pattern-matching engines. Key differences: (1) Accuracy: DeepSeek excels at complex layouts (tables, formulas, mixed languages) with 97% accuracy vs Tesseract's ~85% on complex documents. (2) Token efficiency: 100 tokens/page vs PaddleOCR's higher processing overhead. (3) Hardware: Requires GPU (8GB+ VRAM) vs CPU-only for Tesseract. (4) Context understanding: Can correct OCR errors using surrounding text context. From my Feishu experience integrating DeepSeek models, VLM-based OCR is worth the GPU investment for production document processing.

2

What's the difference between resolution modes (Tiny, Small, Base, Large, Gundam)?

Resolution modes balance token consumption vs accuracy: Tiny (512×512, 64 tokens) - simple receipts/notes with clear text; Small (640×640, 100 tokens) - standard documents, recommended for most use cases; Base (1024×1024, 256 tokens) - complex layouts with tables/charts; Large (1280×1280, 400 tokens) - high-resolution scanned documents; Gundam (dynamic n×640×640 + 1×1024×1024) - academic papers with dense formulas and figures. Pro tip: Start with Small mode and upgrade only when accuracy drops below requirements. This saves significant API costs without sacrificing quality.

3

Is DeepSeek OCR really free and open source?

Yes, 100% open source! The 3B parameter model is available on GitHub (https://github.com/deepseek-ai/DeepSeek-OCR) and Hugging Face under a permissive license. You can: (1) Self-host on your infrastructure (no API costs), (2) Modify the model for your specific needs, (3) Use commercially without licensing fees. This website's online tool offers a free tier (10 conversions/day) for quick tasks. For production use, consider self-hosting with vLLM for maximum cost efficiency (~$0.001 per page on cloud GPUs vs ~$0.01-0.05 for commercial OCR APIs).

4

What are the hardware requirements for self-hosting?

GPU requirements: Minimum: 8GB VRAM (RTX 3070, RTX 4060 Ti) for basic inference at ~5-10 pages/min. Recommended: 16GB+ VRAM (RTX 4090, A100-40G) for production at ~100-200 pages/min. Enterprise: Multi-GPU setup (2-4× A100) for 200K+ pages/day. Software: CUDA 11.8+, PyTorch 2.6.0, vLLM 0.8.5+ for optimal throughput. CPU inference is possible but 50-100× slower (not recommended). Cloud options: AWS (p3/p4 instances), GCP (A100 VMs), Azure (NCv3 series). From practical experience, a single RTX 4090 handles most small-to-medium workloads cost-effectively.

Ready to Experience Next-Gen OCR?

Start converting documents with DeepSeek OCR today. Free tier available - no credit card required.

DeepSeek OCR - Free Online OCR Tool | Vision-Language Model Text Extraction