OCR based document validation
A Country may as evidence , collect a set of pre-defined documents for Identification, date-of-birth proof, address proof , marital status, residence status etc. Currently we don't have any steps to validate them at server side.
We propose an OCR based validation stage to be built . Validation may use a document type template and OCR extraction techniques
Reading Material
Registration Packet, ID Schema, Packet Manager
Registration Processor Stages, VertX, Apache Camel and Workflow XML, Tagging Feature
Youtube Videos
Classifier Stage, Validation Stage
Implementation Approach
A server side classifier stage will examine the documents inside the packet and decide which ones need to be validated using OCR techniques. This information will be tagged to the packet using the packet manager tagging feature.
The camel workflow will call a OCR validation stage with specific details of the document to be validated based on the tags in the packet.
The OCR validation stage will validate the document using an OCR framework such as tesseract against a known template for the document in question. The template to be used is determined by the document type and purpose codes.
The validation on the document includes validity checks that are specific to the document as well as by using the data extracted from the document. This has to be checked against data in the packet.
Work Breakdown
Generic classifier stage
Generic OCR Validation Stage
Specific Validation Library - Passport
Name Validation
Date of Birth Validation
Address Validation
Gender Validation
Authenticity Checks