OCR based document validation

A Country may as evidence , collect a set of pre-defined documents for Identification, date-of-birth proof, address proof , marital status, residence status etc. Currently we don't have any steps to validate them at server side.

We propose an OCR based validation stage to be built . Validation may use a document type template and OCR extraction techniques

Reading Material

Registration Packet, ID Schema, Packet Manager

Registration Processor Stages, VertX, Apache Camel and Workflow XML, Tagging Feature

https://academy.mosip.io

http://docs.mosip.io

Youtube Videos

Classifier Stage, Validation Stage

Implementation Approach

A server side classifier stage will examine the documents inside the packet and decide which ones need to be validated using OCR techniques. This information will be tagged to the packet using the packet manager tagging feature.

The camel workflow will call a OCR validation stage with specific details of the document to be validated based on the tags in the packet.

The OCR validation stage will validate the document using an OCR framework such as tesseract against a known template for the document in question. The template to be used is determined by the document type and purpose codes.

The validation on the document includes validity checks that are specific to the document as well as by using the data extracted from the document. This has to be checked against data in the packet.

Work Breakdown

Generic classifier stage
Generic OCR Validation Stage
Specific Validation Library - Passport
1. Name Validation
2. Date of Birth Validation
3. Address Validation
4. Gender Validation
5. Authenticity Checks