This web site is designed for accessibility. Content is obtainable and functional to any browser or Internet device. This page's full visual experience is available in a graphical browser that supports web standards. See reasons to upgrade your browser.

CRBLP

Homepage

Research

Publications

Technical Reports

Demos-Downloads

People

Internship

Student Projects

Events

Seminar

Links

Contact Information

Center for Research on Bangla Language Processing
BRAC University
66, Mohakhali, Dhaka-1212
Phone: +88 (02) 8824051-4 Ext:4023
Fax: +88 (02) 8810383
crblp@bracu.ac.bd

height=1 src="2 Column Demo_files/180px.gif" width=180 border=0>

::--Optical Character Recognition --::

Name:: BanglaOCR [Current updates]

Summary::

This projects aims to develop an Optical Character Recognizer that can recognize Bangla Scripts. The entire OCR research and development task is mainly divided into three major parts: preprocessing, classification and post-processing. We performed experiment with several techniques for each individual parts and choose the appropriate methods in our implementation. Currently we are using Tesseract OCR engine to perform the recognition task.

Details::

BanglaOCR is the Optical Character Recognizer for Bangla Script. It takes scanned images of a printed page or document as input and converts them into editable Unicode text. The current version of BanglaOCR deals will several independent parts as listed below.

  • Preprocessing
  • Classification
  • Post-processing

The Preprocessing task involves image acquisition, binarization, noise elimination, skew detection and correction, line, word and character level segmentation. Bangla Character segmentation is one of the most significant challenges. For classification we are using Tesseract OCR engine (one of the most accurate free software OCR engines currently available). To perform the post processing task we are using two levels processing. At the first level we are correcting the recognition mistakes based on a certain number of rules. At the second level we are using a suggestion based spell checker that is capable to identify the erroneous words and produce suggestions.
The project goal of BanglaOCR is to develop a market place standard multilingual OCR system that will be capable to perform the digitization of a wide domain of Bangla Document images. This will help to archive the documents from all spheres and prevent the damage and lost of valuable documents and books.


Team::

Status::

  • Released version 0.6. [Status]
  • First version of open source BanglaOCR is released under GNU Public License (GPL) version 2 or later. See the download section.

Research Scope::

  • Research on the usability of BanglaOCR on different type of document images.
  • Research on the preprocessing techniques of historical Bangla document images.
  • Research on Bangla Handwritten Image.
  • Research on the Training and Recognition using different classifiers.
  • Research on Multi Lingual OCR.

Development Scope::

  • Implement the existing developed versions using different language.

Download:: http://code.google.com/p/banglaocr/downloads/list

Timeline:: 2007 – 2009

 

Center for Research on Bangla Language Processing
BRAC University, Dhaka, Bangladesh
© All Rights Reserved 2008