Page Numbers

Specifies encoding method for page numbers in TEI.

All page numbers must be encoded, along with the edition and volume number of the print original. In addition, we want to include information about the image file that was used in the OCR process for the page.

Page information is included in two different places:
  1. The <div> element at the start of each entry.
  2. The <pb> element inserted by Python to indicate the beginning of a new page in the entry.

The <div> element

<div facs="encyclopediabrit24newyrich_raw_0182.jp2" type="entry"
                xml:id="kp-eb0924-0182-0164-04" xmlns="http://www.tei-c.org/ns/1.0">
  1. In this example, we use our standard file-naming practice for the @xml:id value, including the final two-digit code for the entry position on the page.
  2. Use @facs to identify the source information, in this case, the filename of the image used in the OCR process. That data is taken from the Page-Inventory File The last four digits of the image filename are inserted into the @xml:id between the volume and page numbers.
  3. Add @type with the value entry to indicate that the text is an encyclopedia entry.
  4. While rare, sometimes page numbers in the print editions are obviously in error. In such cases, use the corrected page number in @xml:id. Then add @n with the value misnumbered nnn, replacing nnn with the printed page number. This indicates the original page number was out of sequence.

The <pb> element

<pb facs="encyclopediabrit24newyrich_raw_0183.jp2" xml:id="kp-eb0924-0183-0165"/>

When encoding page beginnings, do not use the two-digit code for the entry position, since that only indicates the starting position of the entry. Instead, include an @xml:id for the new page number and indicate the correct original image file for that page. We also no longer need @type, since that was indicated earlier, in the <div> tag.

If the printed page number is obviously in error, follow the procedure for <div> above and add @n to <pb>.

Pagebreaks inside notes

Note text sometimes runs over to the next page. When preparing the page files for conversion to entry files, we insert a placeholder to indicate the location of the pagebreak in the note text, using <pb n="nnn"/>, where n indicates the page number of the following text. In the entry file, we replace @n with the same @xml:id of the <pb> in the body text and add @corresp, to signal that these are the same printed page.