Jump to main content
What is the Nineteenth-Century Knowledge Project?
Transforming Nineteenth-Century Knowledge
About Encyclopedia Britannica and the history of knowledge.
Acknowledgements for all contributors.
How we keep hundreds of thousands of files organized.
Edition-Section System
File organization depends on two basic folder types
Folder names
As the OCR workflow passes through its various stages, production moves into specific folders for each stage. Their names and contents are given below:
A guide to the different repositories used to store ocr-project data.
Setting Up the Repositories
Create local copies of the remote repositories
Page Files
Explains the procedures we use to get the best quality OCR of each page.
AFR Interface
Learn about the main elements of the program interface
Create a Page-Inventory File
Create a page-inventory file.
Create an Image Collection
Organize image files for scanning.
Create an OCR-Project
How to create and manage an OCR-Project.
Recommended settings for all options in ABBYY FineReader
Draw Boxes
Manually creating text recognition boxes improves accuracy
Page Recognition
Excellent page recognition depends on preparing pages properly.
Save and Output
How to output your OCR results.
This introduction to Oxygen XML Editor shows you how to navigate the interface and perform standard procedures on the Encyclopedia files.
Entry Files
Procedures for converting single pages into Encyclopedia entries.
Prep and Convert Pages to Entry Files
Before page files can be converted to entry files, we need to do some housekeeping.
Entry-Inventory File
Document the filenames of every entry in a section using the entry-inventory file.
Process Entry Files
Run cleanup routine on all new entry files
Master Files
After creating entry files, we run an automated metadata generation process to add index terms to each entry.
Automated Metadata Procedure
How we create subject headings for every entry file.
Reference information on file/folder names, TEI-encoding standards, and unicode characters.
Editorial standards
The following editorial principles are employed in creating this digital edition.
Image Sources
Bibliographic information on image sources.
Naming Conventions
Lists the naming conventions we use for editions, sections, folders, and files.
TEI Style Manual
TEI encoding practices for the Knowledge Project.
Unicode Characters
List of unicode characters and entities used frequently in the Encyclopedia and not on the standard US keyboard.

Project Director Peter Melville Logan
National Endowment for the Humanities HAA-261228-18