Seventh Edition

Encyclopedia Britannica, Seventh Edition: A Machine-Readable Text Transcription

version format segment size # of files date GitHub repository download
3.0 TEI (XML) A-H 250 MB 11,129 2024-12-10 eb07/XML_v3 ZIP file
3.0 TEI (XML) I-Z 246 MB 9,987 2024-12-10 eb07/XML_v3 ZIP file
3.0 Plain text (TXT) all 125 MB 21,116 2024-12-10 eb07/TXT_v3 ZIP file
2.1 Plain text (TXT) all 129 MB 20,992 2024-03-08 eb07/TXT_v2 ZIP file
2.0 TEI (XML) A-G 209 MB 10,157 2023-10-18 eb07/XML_v2 ZIP file
2.0 TEI (XML) H-Z 239 MB 10,870 2023-10-18 eb07/XML_v2 ZIP file
2.0 Plain text (TXT) all 129 MB 20,992 2023-10-18 use v2.1 use v2.1
1.0 TEI (XML) A-L 263 MB 13,081 2022-12-09 eb07/XML ZIP file
1.0 TEI (XML) M-Z 174 MB 7,903 2022-12-09 eb07/XML ZIP file
1.2 Plain text (TXT) all 129 MB 20,983 2022-10-07 eb07/TXT ZIP file

Release notes

  • 2024-12-10: XML Release v3.0. Improved text accuracy, 90 new free-standing entries.
  • 2024-12-10: TXT Release v3.0. Improved text accuracy, 90 new free-standing entries.
  • 2023-03-08: TXT Release v2.1. Includes the source information for each entry that was missing from v2.0.
  • 2023-10-18: XML Release v2.0 and TXT Release v2.0. These contain approximately free-standing entries that were previously merged with other entries.
  • 2022-12-09: XML Release v1.0 (TEI encoding) (2 ZIP files).
  • 2022-11-01: TXT Release v1.2: remove space before closing bracket.
  • 2022-10-07: TXT Release v1.1: correct cite for source text; remove space before end punctuation.
  • 2022-10-05: TXT Release v1.01: 3 entries fixed, 2 removed.
  • 2022-10-04: We're excited to publish the text of this edition as TXT files.

Content notes

Plain text files

  • Page beginnings are indicated numerically in-line as [edition:volume:page].
  • Footnotes and marginal notes are included in-line at the point of the siglum, as ^[1. This is note text.]
  • Tables are out of scope and indicated in the text with [table].
  • Formulas are out of scope and are left uncorrected.
  • Further information is available at Editorial Standards.

TEI files

  • Index terms for the entry content are in the <profileDesc> section of the <teiHeader>. Each includes a prefaced URI for the named authority file. See Master Files for further information.
  • Page breaks include a prefaced URI for the online source image. URI resolves to full URL when output to display formats.
  • Footnotes and marginal notes are included in-line at the point of the siglum.
  • Tables are out of scope and are left uncorrected.
  • Formulas are out of scope and are left uncorrected.
  • Further information is available at Editorial Standards.

Storage format

  • Files. Entries are in individual files with a header for the Knowledge Project. They can be individually downloaded from the GitHub repository
  • ZIP file. To easily download the complete edition in either TXT or XML formats, use the ZIP file(s).
  • Directories. Files are organized in directories named for the letter of the entry + the volume number of the print edition.
    Note:
    For example, the directory j12 contains all entries in volume 12 that begin with the letter 'J'.
  • File names are meaningful. Example: kp-eb0708-012301-2345-v1.txt
    • kp = Knowledge Project
    • eb0708 = Encyclopedia Britannica, 7th ed., print vol. 8
    • 012301 = print page 123, 1st entry on the page
    • 2345 = last 4 digits of the source image file name (makes file names unique)
    • v1 = version 1