Jump to main content
What is the Knowledge Project?
Staff and contributors
With hundreds of thousands of files, the Knowledge Project needs a clear means of organizing its data. We use specific naming conventions for all files and folders to order the large quantity of material.
Archive Folder
Used for long-term storage of image files
Code Folder
The central repository for our programming routines and transformation scripts.
Digital Edition Folder
Contains files connected with generating an edition or running analyses
Information Folder
Contains information related to the project
Production Folder
Contains all files involed in creating the textual data from page images of the Encyclopedia Britannica
An introduction to AFR that explains the specific procedures we use to get the best quality text recognition of Encyclopedia page images.
Create an Image Collection
How to organize images files for scanning.
AFR Interface
Learn about the main elements of the program interface
Create an OCR-Project Folder
How to create and manage the OCR-Project folder.
Draw Boxes
Manually creating text recognition boxes improves accuracy
Page Recognition
Excellent page recognition depends on preparing the page properly.
Save and Output
How to output your OCR results.
This introduction to Oxygen XML Editor shows you how to navigate the interface and perform standard procedures on the Encyclopedia files.
Oxygen Interface
An introduction to the main components of the Oxygen interface.
Create an XML-Project
Using Oxygen XML Editor to organize files.
Convert DOCX to TEI
Use Oxygen and XSLT to convert DOCX files to TEI.
Convert Page to Entry Files
Before page files can be converted to entry files, we need to do some housekeeping.
Procedures for converting single pages into Encyclopedia entries.
Validate Entry Files
Use Oxygen to validate the entry files.
Entry-Inventory File
Document the file names of every entry in a section using the entry-inventory file.
Proof the Entry Terms List
Compare our list of recognized entry terms with the print pages.
Reference information on file/folder names, TEI-encoding standards, and unicode characters.
Editorial standards
The following editorial principles are employed in creating this digital edition.
Image Sources
Bibliographic information on print editions and image repositories.
Naming Conventions
Lists the naming conventions we use for editions, sections, folders, and files.
TEI Style Manual
All TEI encoding must follow these guidelines.
Unicode Characters
List of unicode characters and entities used frequently in the Encyclopedia and not on the standard US keyboard.

Project Director Peter Logan