Descriptions of OCR processing, TEI transformation, and metadata creation

Attention: This section is no longer maintained and is posted for information only.

From 2015 to 2021, students working with Dr. Logan at Temple University completed original OCR work for the complete text of the 3rd, 7th, and 9th editions of the Encyclopedia Britannica and portions of the 11th edition. The rest of the 11th was filled in with files earlier produced by Don Kretz (Distributed Proofreading) for Project Gutenberg.


The Organization, Page Files, and Entry Files sections detail the procedures followed in the OCR process and steps used to convert raw OCR output to TEI files.

These sections were created for the use of students working on the project and are retained here as a record of the project's production methods.


During the same time period, we worked with Dr. Jane Greenberg and the Metadata Research Center at Drexel University to automate a method of generating index terms for every entry. That process and its incorporation into the production of the editions is detailed in the Master Files section.