Seventh Edition
Encyclopedia Britannica, Seventh Edition: A Machine-Readable Text Transcription
format | segment | version | size (ZIP) | # of files | date | GitHub repository | download |
---|---|---|---|---|---|---|---|
Plain text (TXT) | all | 2.1 | 55 MB | 21,432 | 2024-03-08 | eb07/TXT_v2 | ZIP file |
TEI (XML) | A-G | 2.0 | 61 MB | 10,156 | 2023-10-18 | eb07/XML_v2 | ZIP file |
TEI (XML) | H-Z | 2.0 | 70 MB | 10,870 | 2023-10-18 | eb07/XML_v2 | ZIP file |
Plain text (TXT) | all | 2.0 | 56 MB | 21,432 | 2023-10-18 | ||
TEI (XML) | A-L | 1.0 | 79 MB | 13,081 | 2022-12-09 | eb07/XML | ZIP file |
TEI (XML) | M-Z | 1.0 | 51 MB | 7,903 | 2022-12-09 | eb07/XML | ZIP file |
Plain text (TXT) | all | 1.2 | 55 MB | 20,984 | 2022-10-07 | eb07/TXT | ZIP file |
Release notes
- 2023-03-08: TXT Release v2.1. Restores the source information for each entry that was missing from v2.0.
- 2023-10-18: XML Release v2.0 and TXT Release v2.0. These contain approximately 500 free-standing entries that were previously merged with other entries.
- 2022-12-09: XML Release v1.0 (TEI encoding) (2 ZIP files).
- 2022-11-01: TXT Release v1.2: remove space before closing bracket.
- 2022-10-07: TXT Release v1.1: correct cite for source text; remove space before end punctuation.
- 2022-10-05: TXT Release v1.01: 3 entries fixed, 2 removed.
- 2022-10-04: We're excited to publish the text of this edition as TXT files.
Content notes
Plain text files
- Page breaks are indicated in-line as [edition:volume:page].
- Footnotes and marginal notes are included in-line at the point of the siglum, as ^[1. This is note text.]
- Tables are out of scope and indicated in the text with [table].
- Formulas are out of scope and are left uncorrected.
- Further information is available at Editorial Standards.
TEI files
- Index terms for the entry content are in the
<profileDesc>
section of the<teiHeader>
. Each includes a prefaced URI for the named authority file. See Master Files for further information. - Page breaks include a prefaced URI for the online source image. URI resolves to full URL when output to display formats.
- Footnotes and marginal notes are included in-line at the point of the siglum.
- Tables are out of scope and are left uncorrected.
- Formulas are out of scope and are left uncorrected.
- Further information is available at Editorial Standards.
Storage format
- Files. Entries are in individual files with a header for the Knowledge Project. They can be individually downloaded from the GitHub repository
- ZIP file. To easily download the complete edition in either TXT or XML formats, use the ZIP file(s).
- Directories. Files are organized in directories named for the letter of
the entry + the volume number of the print edition. Note: For example, the directory j12 contains all entries in volume 12 that begin with the letter 'J'.
- File names are meaningful.
Example: kp-eb0708-022205-1234-v1.txt
- kp = Knowledge Project
- eb0708 = 7th ed., print vol. 8
- 022205 = print page 222, 5th entry on the page
- 1234 = last 4 digits of the source image file name (makes file names unique)
- v1 = version 1
- This work is licensed under a Creative Commons Attribution 4.0 International License.