Organization
How to keep hundreds of thousands of files organized.
Attention:
This section is no longer maintained and is posted for information
only.
With hundreds of thousands of files, the Nineteenth-Century Knowledge Project needs a clear means of organizing its data. We use specific naming conventions for all files and folders to order the material. This section specifies those conventions.
Our workflow has three distinctive stages in it, and this manual is organized around those stages:
- Page Files
- Defined as all processing steps for capturing text from an individual printed page and converting it to the TEI format.
- Entry Files
- The conversion of text from single pages into complete Encyclopedia entries, with one entry per file. Automated cleanup routines correct the most common OCR errors.
- Master Files
- Each entry file is automatically analyzed and has index terms (subject headings) along with linked open data added to its metadata.