Page Files

Explains the procedures we use to get the best quality OCR of each page.

Attention: This section is no longer maintained and is posted for information only.

The process of digitizing historical editions of Encyclopedia Britannica begins capturing text from images of pages, one at a time. A page file contains text captured from one page by using OCR. This section explains the process used to capture the textual data from each page and to store it as a page file.

ABBYY FineReader is a complex program with many options for managing the OCR process. The following instructions are written for AFR 14 and 15. The interface is defined for offices converting paper documents to PDF or Microsoft Word files, and we will ignore most of that to focus on the OCR engine.

A good general introduction to the program is the AFR 15 User Guide online. On the first page, you will see a link to download it as a PDF, if you wish.