Optical Character Recognition System for Myanmar Printed Documents (OCRMPD) 🇲🇲

This is a web-based demonstration of the OCRMPD system. Upload an image of Myanmar text to see a simulation of the recognition process based on the proposed algorithms.

1. Upload Document Image

2. Algorithm Processing Stages

Stage A: Segmentation Algorithm

The system first isolates individual characters. A vertical projection histogram (X-Y cut) makes initial cuts. Then, a crucial Structural Analysis step refines these cuts, checking pixel density and connected components (CCs) to accurately segment complex, overlapping Myanmar characters. Red boxes show the detected segment boundaries on the image below.

Extracted Character Segments:

Segments from the original image will be displayed here after processing.

Stage B: Feature Extraction Algorithm

After segmentation, each character image is normalized. A hybrid statistical method extracts unique features using Zone Density (pixel density in 3 horizontal zones) and Projection Area (area from top/bottom/left/right profiles). This creates a feature vector for each character.

Stage C: Classification Algorithm

Finally, a Hierarchical Multi-class Support Vector Machine (SVM) classifier receives the feature vector. The hierarchical model efficiently navigates the large and visually similar Myanmar character set to determine the final text output.

3. Recognized Text Output

The recognized text will appear here...