Folder names

As the OCR workflow passes through its various stages, production moves into specific folders for each stage. Their names and contents are given below:


folder name	repository	function
1-afr-project	eb03, eb07, eb09, eb11	ABBYY FineReader has a proprietary compressed folder structure for storing its OCR data. See 1-afr-project Folder.
2-page-docx	outputs	We save our OCR results in Word's docx format, with text from one printed page per file. See 2-page-docx Folder.
3-page-tei	outputs	Each docx file is transformed into the TEI format as a page file, with each files containing text from one printed page. See 3-page-tei Folder.
entry	outputs	The TEI page files are separated into individual entry files, with each file containing a complete entry. See Convert Pages to Entries
metadata	outputs	Each entry file is processed by HIVE2 which automatically generates subject headings for the file and outputs them as csv files. These metadata files are stored here.
master	outputs	We import the subject terms from the metadata files into the TEI header of the entry files. The new files because our master files file future editions.