Ninth Edition

Encyclopedia Britannica, Ninth Edition: A Machine-Readable Text Transcription

format segment version size (ZIP) # of files date GitHub repository download
TEI (XML) A-J 1.0 75 MB 9605 2023-02-28 eb09/XML ZIP file
TEI (XML) K-Z 1.0 67 MB 8167 2023-02-28 eb09/XML ZIP file
Plain text (TXT) A-Z 1.0 71.7 MB 17,772 1-Nov-2022 eb09/TXT ZIP file

Release notes

  • 2023-02-28: XML Release v1.0 (TEI encoding) (2 ZIP files)
  • 2022-11-01: We're excited to publish v1.0 of the complete text of this edition as TXT files.

Content notes

Plain text files

  • Page breaks are indicated in-line as [edition:volume:page].
  • Footnotes and marginal notes are included in-line at the point of the siglum, as ^[1. This is note text.]
  • Tables are out of scope and indicated in the text with [table].
  • Formulas are out of scope and are left uncorrected.
  • Further information is available at Editorial Standards.

TEI files

  • Index terms for the entry content are in the <profileDesc> section of the <teiHeader>. Each includes a prefaced URI for the named authority file. See Master Files for further information.
  • Page breaks include a prefaced URI for the online source image. URI resolves to full URL when output to display formats.
  • Footnotes and marginal notes are included in-line at the point of the siglum.
  • Tables are out of scope and are left uncorrected.
  • Formulas are out of scope and are left uncorrected.
  • Further information is available at Editorial Standards.

Storage format

  • Files. Entries are in individual files with a header for the Knowledge Project. They can be individually downloaded from the GitHub repository
  • ZIP file. To easily download the complete edition in either TXT or XML formats, use the ZIP file(s).
  • Directories. Files are organized in directories named for the letter of the entry + the volume number of the print edition.
    Note: For example, the directory j12 contains all entries in volume 12 that begin with the letter 'J'.
  • File names are meaningful. Example: kp-eb0908-022205-1234-v1.txt
    • kp = Knowledge Project
    • eb0908 = 9th ed., print vol. 8
    • 022205 = print page 222, 5th entry on the page
    • 1234 = last 4 digits of the source image file name (makes file names unique)
    • v1 = version 1