Tesseract-OCR 3.02.02

An Optical Character Recognition (OCR) engine started at HP Labs and now under development at Googlethat can help users grab texts from pictures
Transforming text into graphics is not too difficult a task, but trying to extract words from an image file might be quite troublesome. This kind of job needs a special type of equipment, more precisely an Optical Character Recognition (OCR) capable utility.

One of the top engines that were created for these purposes is Tesseract and those who intend to try and use it have at their disposal the Tesseract-OCR package.

Multiple setting installation

Before getting to use this tool, it is a good idea to pay attention to the setup procedure as it may provide some useful extras that may be required when handling documents in many foreign languages.

More precisely, the 'Language data' section enables you to choose the desired languages and also add the math and equation detection module if you plan to extract this type of data as well.

No GUI and quick execution via Command Prompt

As soon as Tesseract-OCR is installed onto your system, you will be able to deploy it via command-line and start using it immediately. There are only a few parameters to apply when working on the target files and they are explained well enough.

The most important values are those for the 'pagesegmode' parameter and they pertain mainly to the page segmentation and image handling.

Fast operation and widely supported output

One of the main strong points of Tesseract-OCR is its ability to recognize and process a variety of graphical image file types. Another great thing about this utility is its processing speed which should satisfy the needs of any user.

When it comes to saving the extracted content, the program generates text (TXT) files with the names you set before starting the task.

Simple tool for all users

All things considered, this command-line application should be not to difficult to understand for less experienced users as it uses a quite simplified syntax. It is quick in processing and accurate enough to be considered among the best in its category.

Reviewed by Olivian Puha on February 24th, 2014


last updated on:
August 5th, 2013, 22:47 GMT
file size:
12.8 MB
price:
FREE!
developed by:
theraysmith
license type:
Apache License 2.0 
operating system(s):
Windows XP / Vista / 7
category:
C: \ Programming \ Other Programming Files

FREE!

In a hurry? Add it to your Download Basket!

softpedia rating

4.0/5

user rating 3

UNRATED
4.0/5
 

0/5

1 Screenshot
Tesseract-OCR - This is the application's usage: imagename outputbase [-l lang] [-psm pagesegfmode] [configfile...]
What's New in This Release:
  • Moved ResultIterator/PageIterator to ccmain.
  • Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic.
  • Added paragraph detection in layout analysis/post OCR.
  • Fixed inconsistent xheight during training and over-chopping.
read full changelog
 

Application description

Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little ...

Add your review! 1 USER REVIEW SO FAR

SUBMIT