Naming Conventions

Lists the naming conventions we use for editions, sections, folders, and files.

Attention:
This page is no longer being updated and is posted for information only.

Editions

eb03 3rd ed. 18 vols. + 2 suppl. vols. Edinburgh: A. Bell and C. MacFarquhar, 1788-1797
eb07 7th ed. 21 vols. Edinburgh: Adam and Charles Black, 1830-1842
eb09 9th ed. 25 vols. NY: Charles Scribner's Sons, 1875-1889
eb11 11th ed. 29 vols. NY: Cambridge University Press, 1910-1911

Page Sections

Print editions are divided into volumes as well as alphabetical sections. For OCR processing, we organize the materials around the alphabet and subdivide all entries into smaller groups of ≅225 pages. We refer to these as page sections. Sections are designated by their alphabet letter followed by a 2-digit number for their order, beginning with 01. Thus section s01 is the first 225 pages in the letter "S," section b02 begins with the 226th page of "B," and r03 covers 451-675 of the letter "R."

Entry Sections

After pages are converted to entries, we add the print volume number to the section name, following the letter. Thus s01 becomes s1101, when it occurs in volume 11 of the print edition. Sections are eliminated after initial processing of the files, See entry Folder for more information.

Filenames

We use three phases in the process of creating machine-readable text from historical Encyclopedia Britannica editions.

Page phase
The process begins by creating text from single pages of the source. The OCR is output as docx files, which we transform into TEI using XSLT. Most of this work is done using page files.
Entry phase
We use Pythonto convert the page files into files containing one entry each, entry files. These files use different file-naming conventions than the page files, adding 2-digit sequence to the end that indicates whether it was the first, second, third, etc. entry on the page. We also add the last four digits of the source image filename, to insure a unique ID for each page within the entry and for the entry file itself.
Master phase
We add index terms to entry files, we change their name to master files. This metadata is stored in the <teiHeader> of the master file. All digital editions created by the project are generated from these masters.
Type Formula* Example Information
ABBYY FineReader DOCX files ed-sec-pg.docx eb07-d02-0173.docx 2-page-docx Folder
entry-inventory files ed-entries-ltr.xlxs eb03-entries-s Entry-Inventory File
Image files Retain original filename encyclopaediabri23chisrich_0312.jp2 Create an Image Collection
Oxygen XML Editor project files ed-sec.xpr eb07-r02.xpr Create an OCR-Project
page-inventory files ed-sec.xlsx eb07-r01.xslx Create a Page-Inventory File
TEI entry files kp-ed+vol-img-pg-seq.xml kp-eb1122-0783-0765-03.xml digital-editions Folder
TEI master files kp-ed+vol-img-pg-seq+m.xml kp-eb1122-0783-0765-03m.xml Master Files
TEI page files ed-sec-pg.xml eb07-d02-0173.xml 3-page-tei Folder

* Abbreviations: ed = edition; img = last 4 digits of image name; ltr = letter; pg = page number; sec = section; seq = entry sequence on the page; vol = volume.

Folder Names

Type Formula Example Information
AFR dictionary folder predefined name UserDictionaries Areas and Text Tab
AFR user pattern and languages predefined name UserPatterns OCR Tab
docx folder predefined name 2-page-docx 2-page-docx Folder
OCR project folder ed-sec eb11-t02 1-afr-project Folder
print edition folder edition eb11 Print Edition Folder
TEI entry folder letter+vol eb03/entry/a01 entry Folder
TEI master folder letter+vol eb03/master/a01 entry Folder
TEI page folder predefined name eb03/page/a01 3-page-tei Folder