Edition-Section System
File organization depends on two basic folder types
The early stage of processing is done by artificially creating a group of 150-250 printed pages to work on at a time. These groups are identified with the edition-section system. We use it for all processing work until entry files are created and validated. At that point, we reorganize the entries by alphabetical letter for each edition.
- Edition
- We are digitizing four editions of the Encyclopedia Britannica. We organize all text initially by the print edition number, using the naming convention eb followed by a two-digit code for the edition. Thus eb07 is the seventh edition.
- Section
- Each edition is organized alphabetically, so we subdivide each edition by alphabetical letter. The full text of some letters is enormous in size, so we further segment each letter by a numbered page section of the letter. Section names begin with the letter followed by a two-digit code for the edition. Thus b03 is the third section of the letter "B."
Combining the naming conventions for editions and sections is a shorthand to indicate working segments of text, like eb11-w02. We use this edition-section system to organize our workflow during the OCR process.