Numbered_folder_names

We number certain folders to track Nineteenth-Century Knowledge Project workflow. Their names and contents are given below:

foldername repository function
1-afr-project eb03, eb07, eb09, eb11 ABBYY FineReader has a proprietary compressed folder structure for storing its OCR data. See 1-afr-project Folder.
2-page-docx outputs We save our OCR results in Word's docx format, with text from one printed page per file. See 2-page-docx Folder.
3-page-tei outputs Each docx file is transformed into the TEI format as a page file, with each files containing text from one printed page. See 3-page-tei Folder.
4-entry-tei outputs The TEI page files are combined and segmented into entry files, with each file containing a single, complete entry. See Convert Pages to Entries
5-entry-md outputs Each entry file is processed by HIVE which auomatically generates subject headings for the file and outputs them as a csv file. These entry metadata files are stored here.
6-master-tei outputs We use Python to import the subject terms from the 5-entry-md files into the TEI Header of the 4-entry-tei files. The result is a properly-encoded TEI file with relevant subject headings for every entry in the Encyclopedia Britannica.