Scaling Document Intelligence for One of the Largest Bank of Asia
When i was working on a project for one of the largest bank of Asia, I had the opportunity to lead the design and development of a large-scale, production-grade document processing system. The scale, complexity, and real-world impact of this system make it
This post outlines the architecture, machine learning components, and infrastructure challenges involved in building an AI-powered system that processes over 2 million document pages daily, translating to roughly 1.46 TB of data every day.
Every day, the bank receives more than 2 million scanned document pages—ranging from ID cards and address proofs to complex, multi-page financial documents. These come in various formats, including TIFF, PDF, JPEG, and PNG. On average, each document is approximately 6 MB in size, resulting in a daily processing volume
The requirements were ambitious:
Redact PII (Personally Identifiable Information) reliably across all document types
Detect and crop various document formats (Aadhaar, PAN, passports, etc.)
Complete processing within 12 hours, every single day
Ensure zero downtime, given the system's role in mission-critical onboarding and verification workflows
I served as the lead architect and engineer, responsible for designing the end-to-end system—from model training to production deployment. This included both the AI models and the backend infrastructure required to handle this volume efficiently and
We built a PII masking engine that used OCR in combination with deep learning-based field detection. Here's how it worked:
OCR was performed using a hybrid of Tesseract and Google Vision API for redundancy and accuracy.
Custom-trained CNN-based models identified PII elements such as names, Aadhaar numbers, PAN numbers, phone numbers, and addresses.
The system achieved 99.8% precision on masking PII even in documents with low quality or non-standard layouts.
To handle the varying layouts and formats of scanned documents, we developed a document detection and cropping model:
Leveraged YOLOv5 for object detection, trained on annotated datasets covering a wide variety of document types and noise conditions.
Applied post-processing filters to improve edge detection and alignment.
Enabled accurate cropping, classification, and standardization of incoming documents.
This step ensured all downstream processes received clean, structured inputs—critical for audit and compliance.
To meet the 12-hour SLA for processing over 2 million pages per day (~12 TB), we built a horizontally scalable architecture with
Key components:
Microservices architecture using FastAPI for modularity and ease of scaling
Task queueing system using Celery + RabbitMQ to enable distributed job execution
Kubernetes to handle autoscaling and deployment across multiple nodes
Batch processing orchestration with scheduled workers and watchdogs for error handling
Storage management optimized for large-scale I/O, with intelligent caching and expiry logic
We also implemented shadow mode deployments and canary rollouts to test model upgrades safely in production.
Since going live, the system has operated without a single unplanned downtime event. This was achieved by investing heavily
Observability using Prometheus + Grafana
Health checks, circuit breakers, and fallback logic
Retry strategies and alerting on latency/volume anomalies
Continuous benchmarking and load testing to maintain SLA margins
Accuracy, scale, and reliability can coexist if baked into the architecture from day one.
AI in production is more than models — orchestration, data flow, and observability are equally critical.
Real-world ML deployments demand robustness, especially when operating in regulated environments like banking.
This project was a strong reminder that delivering AI at scale is a multidisciplinary effort — involving machine learning, system design, DevOps, and data engineering. The solution now processes billions of documents per year, enabling secure and efficient
It's one thing to train a model that works in a notebook. It's something else entirely to deploy a system that works every day, under load, with no room for error. And that's what makes this project one of the most fulfilling parts of my professional
Abhishek Vishwakarma
July 2025

