![]() |
![]() |
![]() |
OCR from PDF file I have a printed address list that I want to import into Excel. The list is too long for me to type, so I want to scan and OCR it.
Zahid
Is there a reason they scan it to a PDF file? It would make more sense just to scan it to a bitmap format, like TIFF or JPEG, which almost any OCR software should support.
Robert Jacobson
There's an Acrobat plug-in that can do OCR on acrobat files, but all that give you is a more searchable PDF; it doesn't extract the actual text. Plus you need to buy the full version of Acrobat, and you've already said you're cheap.
Chris Tavares
Is there some reason you can't scan the list in chunks? If you have a 40-inch tall list that won't fit on your scanner, I would think you can just fold it in thirds and make three scans.
Caliban Tiresias Darklock
ScanSoft's SDK supports it, so I imagine OmniPage 14 or whatever supports it. $99 and it's the best thing that's out there. In fact, it's a really cool feature. It has the ability to create it as a new PDF replacing the existing test w/ simmilar fonts, and leaving the original image untouched for unrecognizable characters.
Elephant
I specifically remember having an OCR plugin once, it actually gave out real text, but the 30 days expired and here I am.
Alex.ro
I think getting them to OCR it for you is the best bet. Shop around, you may find cheaper places.
DJ
Why not pay some University student to type it up for you? It'd be pretty cheap and not too hard to organise.
Koz
Mjau
Caliban Tiresias Darklock: When I said "it's a long list", I didn't mean a single long page. I meant 500 pages. And I don't want to do 500 scans.
zahid
>I may end up taking your advice and having them OCR it. Saves me a ton of work, and then I have someone to blame if the OCR sucks.
a2800276
download ghostscript (free), use 'extract text' function
--
Ghostscript will only extract text encoded as text, not graphics.
Ged Byrne
View the pdf file and take screenshots of each page. Apply OCR to taste.
What I ended up doing was paying them to OCR it. They finally quoted me a much lower price -- rather than three times the price per page, it is just a flat $25 additional. So I paid it.
Zahid
Zahid, I emailed you and offered to do it for free. Seems that is better still than a $25 flat fee.
Elephant
|