entry Folder
Contains the TEI-encoded data after the page files are converted to entry files.
Entry files contain one entry per file. The entry is the basic unit of meaning in the Encyclopedia, and it may range in size from a single sentence to a book-length exposition.
We create entry files by running a Python script on a page section of TEI page files. The script combines all page files and then segments the data at the entry terms to create a new file for each entry. We preserve the original page numbers as well as references to the source image for the page.
Entry files use different file and folder naming conventions than page files. The revised folder name includes the print volume number. The revised filename creates a unique identifier for each entry and more precisely indicates the entry location in the print source.
Entry folder names
Entry folders have two different naming conventions, depending on whether the files are still in process or processing is completed.
IN PROCESS
letter + volume +
batch
- "letter" is the section of the alphabet for the entry.
- "volume" is the volume of the print edition.
- "batch" is a 2-digit sequence for the subset of entries for the letter.
In the figure below, a0105 includes "A" entries from volume one and is the fifth batch of "A" entries. a0206 is the next batch of "A" entries, which are located in volume two.
AFTER PROCESSING
Once complete, we combine all entries into alphabetical folders based on their entry terms.
Entry file names
kp + print-edition & volume +
image-sequence + page-number + position-on-the-page
. - The image-sequence is a 4-digit number taken from the filename of the page scan image.
- The position-on-the-page is a 2-digit number indicating whether the entry appears first, second, or third on that page.