Jump to main content
What is the Knowledge Project?
About
Staff and contributors
With over 100,000 files, the Knowledge Project needs a clear means of organizing its data. We use specific naming conventions for all files and folders, to order the large quantity of material.
Archive Folder
Used for long-term storage of image files
Digital Edition Folder
Contains files connected with generating an edition or running analyses
Information Folder
Contains information related to the project
Production Folder
Contains all files involed in creating the textual data from page images of the Encyclopedia Britannica
An introduction to AFR that explains the specific procedures we use to get the best quality text recognition of Encyclopedia page images.
AFR Interface
Learn about the main elements of the program interface
Create an OCR-Project Folder
How to create and manage the OCR-Project folder.
Draw Boxes
Manually creating text recognition boxes improves accuracy
Page Recognition
Excellent page recognition depends on preparing the page properly.
Save and Output
How to output your OCR results.
This introduction to Oxygen XML Editor shows you how to navigate the interface and perform standard procedures on the Encyclopedia files.
Oxygen Interface
An introduction to the main components of the Oxygen interface.
Create an XML-Project
Using Oxygen XML Editor to organize files.
Check the HTML Code
How to check and correct coding problems in the page files.
Convert HTML to TEI
Using Oxygen and XSLT to convert HTML files to TEI.
Procedures for converting single pages into Encyclopedia entries.
Validate Entry Files
Use Oxygen to validate the entry files.
Entry-Inventory File
Document the file names of every entry in a section using the entry-inventory file.
Proof the Entry Terms List
Compare our list of recognized entry terms with the print pages.
Reference information on file/folder names, TEI-encoding standards, and unicode characters.
Editorial standards
The following editorial principles are employed in creating this digital edition.
Image Sources
Bibliographic information on print editions and image repositories.
Naming Conventions
Lists the naming conventions we use for editions, sections, folders, and files.
TEI Style Manual
All TEI encoding must follow these guidelines.
Unicode Characters
List of unicode characters and entities used frequently in the Encyclopedia and not on the standard US keyboard.