Windows 7, Christmas and a long range project

The only program that I’ve found so far that I wasn’t able to install on Windows 7 was my OmniPage 14 OCR program.

For those who don’t know the term, OCR stands for optical character recognition – in other words, a program that recognizes text in a scanned image and turns it into editable text.

For Christmas, I received a copy of OmniPage 17.  I plan to use it in a long range project that involves old public domain books.

Like any other processing software, an OCR program will to some degree be limited by the quality of the data, or, in this case, images, that is provided as input.

For my first test of the program, I loaded a pdf file of a 440 page book published in 1913.  OmniPage 17 was able to load and process the entire book, unlike previous versions of OmniPage and other OCR software that I have used, though  I’m sure that, at the time, they had also been somewhat limited by the operating systems and computers.

During the processing, OmniPage 17 flags text that it is not “certain” of and provides the user with the opportunity to correct or ignore the text.  The percentage of flagged text is far lower than I expected.

After the book was processed by OmniPage, I saved it and proofed it in Microsoft Word.

The proofing was, by far, the hardest part of the process.  I read the entire book, with much more attention to detail than I would have if I had just been reading it for pleasure, in order to correct any errors that the OCR might have made as well as to italicize words that were in italics in the book.  Again, the number of corrections needed were far fewer than I expected.  I suspect that there were be very few corrections needed when converting modern documents from image to text.

What was the book that I converted, some might wonder?

It’s part of the long term project, so I don’t want to be too specific at this time other than to say that it was a diary of a lady who had been raised in privilege.


Comments on this entry are closed.

%d bloggers like this: