Prep and Convert Pages to Entry Files

Before page files can be converted to entry files, we need to do some housekeeping.

We want to clean up the page files to optimize their conversion into entry files. All changes are made in the TEI file, leaving the docx file untouched.

We use the Python script note_counts.py to correct common errors with note encoding: each file must have the same number of note anchors (@@) and note text (@@@), and we every note must be on a separate line.

When the pages are scanned, the operators add comments in the page-inventory file about anomalies, like when a footnote runs over to the next page, and we address them in this preparation phase.