Fog Creek Software
Discussion Board




Open Source OCR Software


Can anyone recommend me a good OCR Software

Which is able to transform a scanned (page from a book) document into an electronic file (.txt, word or pdf...)

>> TD <<
Friday, January 02, 2004

Do you mean "open source", or "free"?

Dennis Forbes
Friday, January 02, 2004

I believe there are three or four projects that are pretty far along. Search on sourceforge.org and freshmeat.net...

Eric DeBois
Friday, January 02, 2004

Dennis :

Free ;-)

>> TD <<
Friday, January 02, 2004

Well OCR Software covers a wide variety of topics.  I assume you mean Latin based OCR Software, and as far as recommending "good" open source/free OCR software, the long and short of it is, no.  I can not recommend any "good" open source/free OCR packages for Latin based recognition.

Elephant
Friday, January 02, 2004

Elephant :

You're assuming correctly : Latin Based OCR Recognition Software, that's what I'm looking for.

>> TD <<
Friday, January 02, 2004

"Open Source OCR Software. Can anyone recommend ..."
"Do you mean "open source", or "free"? "
"Free ;-)"

I rest my case :-)

Just me (Sir to you)
Friday, January 02, 2004

What case??

Who wants to pay for beer when you can get it for free?

Basic mathematics would suggest that utility per $$ is going to be higher where said $$ are low.

On the other hand, some of the free beer is pretty nasty stuff, but if your stomach can handle it, by all means go ahead.

On the other hand, folk to pay £250k for a Rolls when £100 cars are available.

Tapiwa
Friday, January 02, 2004

The case that states that "Open Source" == "free (as in beer)" in just about everybodies mind. Who cares about source code dude, I just don't want to pay!

Even developers now seem to no longer discriminate the two concepts, to the point where both expressions are just used as full synonyms.

Just me (Sir to you)
Friday, January 02, 2004

Worse, they don't distinguish between free for Linux and free for Windows.

The long and short of TD's request is that there is loads of "free" OCR software available. Whenever you buy a scanner you get a "free" Windows OCR program bundled.

The problem of course is that you do very miuch get what you pay for, in that cheap OCR software simply doesn't recognize the text as often as the latest edition of Omni Page Pro does; it's not a question that free means a clunkier interface, or no advanced features.

Any company that is going paperless should tnink of investing in one Windows machine with Omni Page Pro 12 installed. And of course the best scanner they can get, preferably with a muulti-sheet feeder. The quality of the gray scale image the scanner makes has a considerable effect on the accuracy of the OCR.

Stephen Jones
Friday, January 02, 2004

Stephen is right.

The free ones work, but not well enough to be useful.  As an OCR dev using the ScanSoft 12 engines (read Omnipage), I would say that this product is the best out there for Latin OCR.

What class of documents are you trying to recognize?  That also has a large impact on the type of engine you use.  Quite a few of the engines that the Omnipage product uses are 3rd party engines, so depending on your needs, that may be overkill.

All I know is that you want to OCR latin documents for hopefully "free".  Let me know more of your intentions, and I can try and point you in the right direction.

Elephant
Friday, January 02, 2004

Just me... you are right.

I do apologise. I get miffed by the same thing too.

I hastily jumped to the conclusion that you belonged to the "open source is evil because it is free" camp.

Some free stuff is closed source. Some paid for stuff is open source (once you pay they give you source).

At times though, the fight to correct this error in terms seems as pointless as trying to convince people that

method == methodology

Tapiwa
Friday, January 02, 2004

For me it would cleave along the line of purpose of use.  If someone is looking for an OCR engine to interface into and make part of their own application, then they're looking for some kind of open source library or filter application.

On the other hand if they just got a second hand scanner with no software they just want some end user app for free, regardless of whether its open source or not.

Simon Lucy
Friday, January 02, 2004

If I was still in college I was be waiting for someone to say,

Oracle for Linux,
I rest my class.

But now I am just old old old and my fart stinks more than most people, so I'll just write proprietary code that rocks.

Has any of the older JOS forum addicts written actual successful (successful as in getting people to pay for the beer) OSS software? Just wondering, since everyone seems to know so much about it, I figured some of you have great stories to share...

Li-fan Chen
Saturday, January 03, 2004

Open source == no dollars.

Mike
Saturday, January 03, 2004

if you can't afford a decent program like Omni Page Pro then you are not employed. Thus you have lots of free time. Just type in the documents by hand.

Dennis Atkins
Saturday, January 03, 2004

Or employed in a big bureaucratic corporation that takes two months to approve the purchase of a $30 piece of software.

T. Norman
Sunday, January 04, 2004

I really don't remember, and I am not sure if it's in a decent shape to be used for day to day work, but I think Xerox may have some open (but patented) projects that solves OCR problems. But really, unless you are programming an assembly line direct mail response parser or something fancy pants like that, you couldn't possibly doing that bad spending a little money for Omni or Textbridge.

Li-fan Chen
Monday, January 05, 2004

*  Recent Topics

*  Fog Creek Home