Windows 7, Christmas and a long range project

December 30, 2009

The only program that I’ve found so far that I wasn’t able to install on Windows 7 was my OmniPage 14 OCR program.

For those who don’t know the term, OCR stands for optical character recognition – in other words, a program that recognizes text in a scanned image and turns it into editable text.

For Christmas, I received a copy of OmniPage 17.  I plan to use it in a long range project that involves old public domain books.

Like any other processing software, an OCR program will to some degree be limited by the quality of the data, or, in this case, images, that is provided as input.

For my first test of the program, I loaded a pdf file of a 440 page book published in 1913.  OmniPage 17 was able to load and process the entire book, unlike previous versions of OmniPage and other OCR software that I have used, though  I’m sure that, at the time, they had also been somewhat limited by the operating systems and computers.

During the processing, OmniPage 17 flags text that it is not “certain” of and provides the user with the opportunity to correct or ignore the text.  The percentage of flagged text is far lower than I expected.

After the book was processed by OmniPage, I saved it and proofed it in Microsoft Word.

The proofing was, by far, the hardest part of the process.  I read the entire book, with much more attention to detail than I would have if I had just been reading it for pleasure, in order to correct any errors that the OCR might have made as well as to italicize words that were in italics in the book.  Again, the number of corrections needed were far fewer than I expected.  I suspect that there were be very few corrections needed when converting modern documents from image to text.

What was the book that I converted, some might wonder?

It’s part of the long term project, so I don’t want to be too specific at this time other than to say that it was a diary of a lady who had been raised in privilege.

{ 5 comments… read them below or add one }

1 Dot January 1, 2010 at 9:37 PM

Intriguing! It’s unfortunate that Windows 7 couldn’t handle Omnipage 14, but nice that it led to a great Christmas gift.

BTW, what is the purpose of those mini-windows that Windows Vista and Windows 7 pop up when the cursor is on the bar at the bottom? I haven’t figured it out.
Dot´s last blog ..Comment on Snowstorm! by Dot My ComLuv Profile

2 Rose January 1, 2010 at 9:58 PM

I agree it is unfortunate that Windows 7 couldn’t handle it. Sounds like a great program.
Rose´s last blog ..Generate Traffic to your blog My ComLuv Profile

3 Mike Goad January 1, 2010 at 10:20 PM

Dot & Rose – Omnipage 17 is much better than 14 was, so it all turned out better in the long run.

The purpose of the little windows is to allow you to see what is open when you run the cursor over them. For instance, right now if I put the cursor over my firefox icon, four windows will come up. One of them will show a smaller version of this page, another is the firefox download page, and the other two are open for editing another project that I am working on.

4 teeni January 3, 2010 at 7:10 PM

I’m glad you were able to get a newer version of the OCR software and that it was compatible with Windows 7. It sounds like fun. I had the opportunity to do some OCR work a number of years ago and I enjoyed “teaching” it to read new fonts and things. Now I will sound like a nerd. LOL. Hi Mike – hope 2010 is being good to you! :)

5 Mike Goad January 4, 2010 at 7:09 AM

teeni – Thanks. The nice thing about this software is that I don’t seem to have to “teach” it at all. It get’s almost everything right to the point that, when I proof the OCR’s product, I have to read it it very carefully, looking for typos.

Best wishes for 2010 and beyond. :)

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

CommentLuv Enabled

Previous post:

Next post: