What is the Knowledge Project?
Editorial standards
The following editorial principles are employed in creating this digital edition.
Staff and contributors
With over 100,000 files, the Knowledge Project needs a clear means of organizing its data. We use specific naming conventions for all files and folders, to order the large quantity of material.
Archive Folder
This is our long-term storage for original image files of print editions of the Encyclopedia. We never modify these files. Instead, we copy them to the production/images folder for active use.
Digital Edition Folder
The second stage in the project is creating editions that we can analyze or place on the web. All files connected with either generating an edition or running an analysis of its data are contained here.
Information Folder
We use this folder to organize useful information on TEI, OCR, metadata, the history of Encyclopedia Britannica, and more, including this manual.
Production Folder
The first stage of the Nineteenth-Century Knowledge Project is creating the textual data from page images of the Encyclopedia Britannica. All files in this initial process are contained here.
An introduction to AFR that explains the specific procedures we use to get the best quality text recognition of Encyclopedia page images.
AFR Interface
This section describes the main elements of the OCR interface.
Create an OCR-Project Folder
How to create and manage the OCR-Project folder.
Draw Boxes
Manually creating text recognition boxes improves accuracy
Page Recognition
Excellent page recognition depends on preparing the page properly.
Save and Output
How to output your OCR results.
This introduction to Oxygen XML Editor shows you how to navigate the interface and perform standard procedures on the Encyclopedia files.
Oxygen Interface
An introduction to the main components of the Oxygen interface.
Create an XML-Project
Using Oxygen XML Editor to organize files.
Check the HTML Code
How to check and correct coding problems in the page files.
Convert HTML to TEI
Using Oxygen and XSLT to convert HTML files to TEI.
Procedures for converting single pages into Encyclopedia entries.
Validate Entry Files
Use Oxygen to validate the entry files.
Entry-Inventory File
Document the file names of every entry in a section using the entry-inventory file.
Proof the Entry Terms List
Compare our list of recognized entry terms with the print pages.
Reference information on file/folder names, TEI-encoding standards, and more!
Image Sources
Bibliographic information on print editions and image repositories.
Naming Conventions
Lists the naming conventions we use for editions, sections, folders, and files.
TEI Style Manual
All TEI encoding must follow these guidelines.
Unicode Characters
List of unicode characters and entities used frequently in the Encyclopedia and not on the standard US keyboard.