Folder names

As the OCR workflow passes through its various stages, production moves into specific folders for each stage. Their names and contents are given below:

folder name repository function
1-afr-project eb03, eb07, eb09, eb11 ABBYY FineReader has a proprietary compressed folder structure for storing its OCR data. See 1-afr-project Folder.
2-page-docx outputs We save our OCR results in Word's docx format, with text from one printed page per file. See 2-page-docx Folder.
3-page-tei outputs Each docx file is transformed into the TEI format as a page file, with each files containing text from one printed page. See 3-page-tei Folder.
entry outputs The TEI page files are separated into individual entry files, with each file containing a complete entry. See Convert Pages to Entries
metadata outputs Each entry file is processed by HIVE2 which automatically generates subject headings for the file and outputs them as csv files. These metadata files are stored here.
master outputs We import the subject terms from the metadata files into the TEI header of the entry files. The new files because our master files file future editions.