Home
Production
Descriptions of OCR processing, TEI transformation, and metadata creation
Page Files
Explains the procedures we use to get the best quality OCR of each page.

Production
Descriptions of OCR processing, TEI transformation, and metadata creation
- Organization
  How to keep hundreds of thousands of files organized.
- Page Files
  Explains the procedures we use to get the best quality OCR of each page.
  - AFR Interface
    Learn about the main elements of the program interface
  - Create a Page-Inventory File
    Create a page-inventory file.
  - Create an Image Collection
    Organize image files for scanning.
  - Create an OCR-Project
    How to create and manage an OCR-Project.
  - Settings
    Recommended settings for all options in ABBYY FineReader
  - Draw Boxes
    Manually creating text recognition boxes improves accuracy
  - Page Recognition
    Excellent page recognition depends on preparing pages properly.
  - Save and Output
    How to output your OCR results.
  - TEI-XML
    This introduction to Oxygen XML Editor shows you how to navigate the interface and perform standard procedures on the Encyclopedia files.
- Entry Files
  Procedures for converting single pages into Encyclopedia entries.
- Master Files
  After creating entry files, we run an automated metadata generation process to add index terms to each entry.

Page Files

Explains the procedures we use to get the best quality OCR of each page.

Attention:

This section is no longer maintained and is posted for information only.

The process of digitizing historical editions of Encyclopedia Britannica begins capturing text from images of pages, one at a time. A page file contains text captured from one page by using OCR. This section explains the process used to capture the textual data from each page and to store it as a page file.

ABBYY FineReader is a complex program with many options for managing the OCR process. The following instructions are written for AFR 14 and 15. The interface is defined for offices converting paper documents to PDF or Microsoft Word files, and we will ignore most of that to focus on the OCR engine.

A good general introduction to the program is the AFR 15 User Guide online. On the first page, you will see a link to download it as a PDF, if you wish.