Tesseract-OCR 3.02.02

An Optical Character Recognition (OCR) engine started at HP Labs and now under development at Googlethat can help users grab texts from pictures

  Add it to your Download Basket!

 Add it to your Watch List!


Rate it!

What's new in Tesseract-OCR 3.02.02:

  • Moved ResultIterator/PageIterator to ccmain.
  • Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic.
  • Added paragraph detection in layout analysis/post OCR.
  • Fixed inconsistent xheight during training and over-chopping.
Read full changelog
send us
an update
Apache License 2.0 
12.8 MB
4.0/5 3
C: \ Programming \ Other Programming Files
1 Tesseract-OCR Screenshot:
Tesseract-OCR - This is the application's usage: imagename outputbase [-l lang] [-psm pagesegfmode] [configfile...]
Transforming text into graphics is not too difficult a task, but trying to extract words from an image file might be quite troublesome. This kind of job needs a special type of equipment, more precisely an Optical Character Recognition (OCR) capable utility.

One of the top engines that were created for these purposes is Tesseract and those who intend to try and use it have at their disposal the Tesseract-OCR package.

Multiple setting installation

Before getting to use this tool, it is a good idea to pay attention to the setup procedure as it may provide some useful extras that may be required when handling documents in many foreign languages.

More precisely, the 'Language data' section enables you to choose the desired languages and also add the math and equation detection module if you plan to extract this type of data as well.

No GUI and quick execution via Command Prompt

As soon as Tesseract-OCR is installed onto your system, you will be able to deploy it via command-line and start using it immediately. There are only a few parameters to apply when working on the target files and they are explained well enough.

The most important values are those for the 'pagesegmode' parameter and they pertain mainly to the page segmentation and image handling.

Fast operation and widely supported output

One of the main strong points of Tesseract-OCR is its ability to recognize and process a variety of graphical image file types. Another great thing about this utility is its processing speed which should satisfy the needs of any user.

When it comes to saving the extracted content, the program generates text (TXT) files with the names you set before starting the task.

Simple tool for all users

All things considered, this command-line application should be not to difficult to understand for less experienced users as it uses a quite simplified syntax. It is quick in processing and accurate enough to be considered among the best in its category.

Tesseract-OCR was reviewed by , last updated on February 24th, 2014

Runs on: Windows XP / Vista / 7

#OCR engine #image to text #read image #OCR #text #recognition #recognize

Add your review! 1 USER REVIEW SO FAR