Skip to content
All Projects
AI EngineeringProduction

Intelligent Document Processing Platform

AI-Powered Multilingual OCR & Document Intelligence

Converts scanned PDFs and photographed forms into clean, structured Markdown across 10+ languages including complex scripts like Urdu, Arabic and Amharic with zero manual intervention.

FastAPICeleryRedisDoclingRapidOCRTesseract 5TATRGemini 2.0OpenCVNext.js
Problem Statement

Enterprises process thousands of documents monthly. Manual keying averages 6–8 minutes per page and scales linearly with volume, while standard OCR fails on non-Latin scripts and discards tables, headings and reading order.

  • Manual data entry is slow, costly and scales linearly with volume.
  • Standard OCR tools fail on Urdu, Arabic, Amharic and Khmer scripts.
  • Raw OCR loses tables, reading order and headings flattening semantic context.
Headline Outcomes
Automated

Manual keying eliminated

6–8 min / page

10+

Languages supported

~300 MB RAM

Web process footprint

The Solution

A production-grade, three-layer AI pipeline (layout detection → OCR → Markdown assembly) exposed over an async REST API, with intelligent Tier-4 routing to Gemini 2.0 Flash when local models hit their accuracy ceiling.

Async five-stage pipeline behind a FastAPI 202-accept pattern keeps the web process lean (~300 MB RAM).

Two Celery queues isolate CPU-bound OCR (prefork) from I/O-bound Gemini calls (gevent).

Docling Heron collapses the DocLayNet taxonomy into 6 canonical labels for accurate structure segmentation.

Tier-4 routing: pages with ≥25% complex-script characters bypass local OCR for best-in-class accuracy.

System Architecture

How the data flows

01

Upload & Probe

REST API, 100 MB limit

02

Language Detect

langdetect + Tesseract OSD

03

Layout Detect

Docling Heron (HuggingFace)

04

OCR + Tables

RapidOCR + Tesseract + TATR

05

Markdown Output

GFM tables + XY-cut order

Result 01

Eliminated expensive manual data-entry workflows at enterprise scale.

Result 02

Unlocked searchable, machine-readable content from legacy multilingual archives.

Result 03

Serves government, legal, research and healthcare digitisation use cases.

Available for new work

Have a backend, AI, or data problem worth solving?

From production APIs to self-hosted AI that kills per-call costs let's scope it. I reply within one business day.