Folder names
As the OCR workflow passes through its various stages, production moves into specific folders for each stage. Their names and contents are given below:
folder name | repository | function |
---|---|---|
1-afr-project | eb03, eb07, eb09, eb11 | ABBYY FineReader has a proprietary compressed folder structure for storing its OCR data. See 1-afr-project Folder. |
2-page-docx | outputs | We save our OCR results in Word's docx format, with text from one printed page per file. See 2-page-docx Folder. |
3-page-tei | outputs | Each docx file is transformed into the TEI format as a page file, with each files containing text from one printed page. See 3-page-tei Folder. |
entry | outputs | The TEI page files are separated into individual entry files, with each file containing a complete entry. See Convert Pages to Entries |
metadata | outputs | Each entry file is processed by HIVE2 which automatically generates subject headings for the file and outputs them as csv files. These metadata files are stored here. |
master | outputs | We import the subject terms from the metadata files into the TEI header of the entry files. The new files because our master files file future editions. |