Seventh Edition

Encyclopedia Britannica, Seventh Edition: A Machine-Readable Text Transcription

format segment version size (ZIP) # of files date GitHub repository download
Plain text (TXT) all 2.1 55 MB 21,432 2024-03-08 eb07/TXT_v2 ZIP file
TEI (XML) A-G 2.0 61 MB 10,156 2023-10-18 eb07/XML_v2 ZIP file
TEI (XML) H-Z 2.0 70 MB 10,870 2023-10-18 eb07/XML_v2 ZIP file
Plain text (TXT) all 2.0 56 MB 21,432 2023-10-18
TEI (XML) A-L 1.0 79 MB 13,081 2022-12-09 eb07/XML ZIP file
TEI (XML) M-Z 1.0 51 MB 7,903 2022-12-09 eb07/XML ZIP file
Plain text (TXT) all 1.2 55 MB 20,984 2022-10-07 eb07/TXT ZIP file

Release notes

  • 2023-03-08: TXT Release v2.1. Restores the source information for each entry that was missing from v2.0.
  • 2023-10-18: XML Release v2.0 and TXT Release v2.0. These contain approximately 500 free-standing entries that were previously merged with other entries.
  • 2022-12-09: XML Release v1.0 (TEI encoding) (2 ZIP files).
  • 2022-11-01: TXT Release v1.2: remove space before closing bracket.
  • 2022-10-07: TXT Release v1.1: correct cite for source text; remove space before end punctuation.
  • 2022-10-05: TXT Release v1.01: 3 entries fixed, 2 removed.
  • 2022-10-04: We're excited to publish the text of this edition as TXT files.

Content notes

Plain text files

  • Page breaks are indicated in-line as [edition:volume:page].
  • Footnotes and marginal notes are included in-line at the point of the siglum, as ^[1. This is note text.]
  • Tables are out of scope and indicated in the text with [table].
  • Formulas are out of scope and are left uncorrected.
  • Further information is available at Editorial Standards.

TEI files

  • Index terms for the entry content are in the <profileDesc> section of the <teiHeader>. Each includes a prefaced URI for the named authority file. See Master Files for further information.
  • Page breaks include a prefaced URI for the online source image. URI resolves to full URL when output to display formats.
  • Footnotes and marginal notes are included in-line at the point of the siglum.
  • Tables are out of scope and are left uncorrected.
  • Formulas are out of scope and are left uncorrected.
  • Further information is available at Editorial Standards.

Storage format

  • Files. Entries are in individual files with a header for the Knowledge Project. They can be individually downloaded from the GitHub repository
  • ZIP file. To easily download the complete edition in either TXT or XML formats, use the ZIP file(s).
  • Directories. Files are organized in directories named for the letter of the entry + the volume number of the print edition.
    Note: For example, the directory j12 contains all entries in volume 12 that begin with the letter 'J'.
  • File names are meaningful. Example: kp-eb0708-022205-1234-v1.txt
    • kp = Knowledge Project
    • eb0708 = 7th ed., print vol. 8
    • 022205 = print page 222, 5th entry on the page
    • 1234 = last 4 digits of the source image file name (makes file names unique)
    • v1 = version 1