Naming Conventions
Lists the naming conventions we use for editions, sections, folders, and files.
Editions
eb03 | 3rd ed. 18 vols. + 2 suppl. vols. Edinburgh: A. Bell and C. MacFarquhar, 1788-1797 |
eb07 | 7th ed. 21 vols. Edinburgh: Adam and Charles Black, 1830-1842 |
eb09 | 9th ed. 25 vols. NY: Charles Scribner's Sons, 1875-1889 |
eb11 | 11th ed. 29 vols. NY: Cambridge University Press, 1910-1911 |
Page Sections
Print editions are divided into volumes as well as alphabetical sections. For OCR processing, we organize the materials around the alphabet and subdivide all entries into smaller groups of ≅225 pages. We refer to these as page sections. Sections are designated by their alphabet letter followed by a 2-digit number for their order, beginning with 01. Thus section s01 is the first 225 pages in the letter "S," section b02 begins with the 226th page of "B," and r03 covers 451-675 of the letter "R."
Entry Sections
After pages are converted to entries, we add the print volume number to the section name, following the letter. Thus s01 becomes s1101, when it occurs in volume 11 of the print edition. Sections are eliminated after initial processing of the files, See entry Folder for more information.
Filenames
We use three phases in the process of creating machine-readable text from historical Encyclopedia Britannica editions.
- Page phase
- The process begins by creating text from single pages of the source. The OCR is output as docx files, which we transform into TEI using XSLT. Most of this work is done using page files.
- Entry phase
- We use Pythonto convert the page files into files containing one entry each, entry files. These files use different file-naming conventions than the page files, adding 2-digit sequence to the end that indicates whether it was the first, second, third, etc. entry on the page. We also add the last four digits of the source image filename, to insure a unique ID for each page within the entry and for the entry file itself.
- Master phase
- We add index terms to entry files, we change their name to master files. This metadata is stored in the
<teiHeader>
of the master file. All digital editions created by the project are generated from these masters.
Type | Formula* | Example | Information |
---|---|---|---|
ABBYY FineReader DOCX files | ed-sec-pg.docx | eb07-d02-0173.docx | 2-page-docx Folder |
entry-inventory files | ed-entries-ltr.xlxs | eb03-entries-s | Entry-Inventory File |
Image files | Retain original filename | encyclopaediabri23chisrich_0312.jp2 | Create an Image Collection |
Oxygen XML Editor project files | ed-sec.xpr | eb07-r02.xpr | Create an OCR-Project |
page-inventory files | ed-sec.xlsx | eb07-r01.xslx | Create a Page-Inventory File |
TEI entry files | kp-ed+vol-img-pg-seq.xml | kp-eb1122-0783-0765-03.xml | digital-editions Folder |
TEI master files | kp-ed+vol-img-pg-seq+m.xml | kp-eb1122-0783-0765-03m.xml | Master Files |
TEI page files | ed-sec-pg.xml | eb07-d02-0173.xml | 3-page-tei Folder |
* Abbreviations: ed = edition; img = last 4 digits of image name; ltr = letter; pg = page number; sec = section; seq = entry sequence on the page; vol = volume.
Folder Names
Type | Formula | Example | Information |
---|---|---|---|
AFR dictionary folder | predefined name | UserDictionaries | Areas and Text Tab |
AFR user pattern and languages | predefined name | UserPatterns | OCR Tab |
docx folder | predefined name | 2-page-docx | 2-page-docx Folder |
OCR project folder | ed-sec | eb11-t02 | 1-afr-project Folder |
print edition folder | edition | eb11 | Print Edition Folder |
TEI entry folder | letter+vol | eb03/entry/a01 | entry Folder |
TEI master folder | letter+vol | eb03/master/a01 | entry Folder |
TEI page folder | predefined name | eb03/page/a01 | 3-page-tei Folder |