Page Numbers

Specifies encoding method for page numbers in TEI.

All page numbers must be encoded along with the edition and volume number of the print original. We further include a link to the online image file used in the OCR process for the page.

Page and/or image information is included in three different places:
  1. The <TEI> element at the beginning of every file.
  2. The <div type="entry"> element at the start of the entry text.
  3. The <pb> (page beginning) element indicating the beginning of a new page in the entry.

The <TEI> element

<TEI xml:id="kp-eb0302-0698-0691" xml:lang="en" xmlns="http://www.tei-c.org/ns/1.0"/>
  1. The @xml:id attribute identifies the first print page of the source material. The first cluster after "kp" identifies the source as the 3th edition, volume 2. The next cluster records the last 4 digits of the source image filename. The last cluster records the print page number for the start of the entry.
  2. @xml:lang records the principle language of the text ("en" = English).
  3. @xmlns supplies the code namespace for the document (in this case, it gives the URI for the TEI namespace).

The <div> element

<div facs="ia:gri_33125011196710/page/n698" type="entry">
  1. We use @facs to identify the source information by adding the URI of the image used in the OCR process.
    1. "ia:" is a shorthand notation for the online archive where the image is hosted (in this case, Internet Archive). When output to HTML or PDF, these abbreviations are expanded to their full URI value, producing a valid URI that resolves to the online image.
  2. We include @type with the value entry to indicate that the text is an encyclopedia entry.
  3. See Misnumbered Pages note below.

The <pb> element

<pb break="no" facs="ia:gri_33125011196710/page/n699" xml:id="kp-eb0302-0699-0692"/>
  1. <pb> counts as a white space; setting @break to "no" turns this off.
  2. Include @facs with the URI for original page image.
  3. Add the @xml:id for the new page number. The last four digits of the image URI are also inserted into the @xml:id to insure uniqueness.
  4. See Misnumbered Pages note below.

Pagebreaks within notes

Notice:
We adapted this technique from the Women Writers Project.
Note text can run over to the next page(s). Link the <pb> in the note to the corresponding <pb> in the main text using @xml:id and @corresp on both <pb> elements, to clarify that they both reference the same page break.
  • For the <pb> in the note,
    1. <pb corresp="kp-eb0301-0059-0036" xml:id="pbn336"/>
    2. Add @corresp with the value of the corresponding @xml:id in the main text <pb>.
    3. Create an @xml:id value for the <pb>. It should begin with pbn followed by the edition number and the referenced page number.
  • For the <pb> in the main text,
    1. <pb corresp="pbn336" facs="ia:gri_33125011196827/page/n59" xml:id="kp-eb0301-0059-0036"/>
    2. Add @corresp with the value of the corresponding @xml:id in the note <pb>.

Misnumbered pages

n="misnumbered 0150" xml:id="kp-eb0302-0162-0148"

While rare, sometimes page numbers in the print editions are obviously in error. In such cases, use the correct page number in @xml:id. Then add @n with the value misnumbered nnn, replacing nnn with the printed page number. This indicates the original page number was out of sequence.