Seventh Edition
Encyclopedia Britannica, Seventh Edition: A Machine-Readable Text Transcription
version | format | segment | size | # of files | date | GitHub repository | download |
---|---|---|---|---|---|---|---|
3.0 | TEI (XML) | A-H | 250 MB | 11,129 | 2024-12-10 | eb07/XML_v3 | ZIP file |
3.0 | TEI (XML) | I-Z | 246 MB | 9,987 | 2024-12-10 | eb07/XML_v3 | ZIP file |
3.0 | Plain text (TXT) | all | 125 MB | 21,116 | 2024-12-10 | eb07/TXT_v3 | ZIP file |
2.1 | Plain text (TXT) | all | 129 MB | 20,992 | 2024-03-08 | eb07/TXT_v2 | ZIP file |
2.0 | TEI (XML) | A-G | 209 MB | 10,157 | 2023-10-18 | eb07/XML_v2 | ZIP file |
2.0 | TEI (XML) | H-Z | 239 MB | 10,870 | 2023-10-18 | eb07/XML_v2 | ZIP file |
2.0 | Plain text (TXT) | all | 129 MB | 20,992 | 2023-10-18 | use v2.1 | use v2.1 |
1.0 | TEI (XML) | A-L | 263 MB | 13,081 | 2022-12-09 | eb07/XML | ZIP file |
1.0 | TEI (XML) | M-Z | 174 MB | 7,903 | 2022-12-09 | eb07/XML | ZIP file |
1.2 | Plain text (TXT) | all | 129 MB | 20,983 | 2022-10-07 | eb07/TXT | ZIP file |
Release notes
- 2024-12-10: XML Release v3.0. Improved text accuracy, 90 new free-standing entries.
- 2024-12-10: TXT Release v3.0. Improved text accuracy, 90 new free-standing entries.
- 2023-03-08: TXT Release v2.1. Includes the source information for each entry that was missing from v2.0.
- 2023-10-18: XML Release v2.0 and TXT Release v2.0. These contain approximately free-standing entries that were previously merged with other entries.
- 2022-12-09: XML Release v1.0 (TEI encoding) (2 ZIP files).
- 2022-11-01: TXT Release v1.2: remove space before closing bracket.
- 2022-10-07: TXT Release v1.1: correct cite for source text; remove space before end punctuation.
- 2022-10-05: TXT Release v1.01: 3 entries fixed, 2 removed.
- 2022-10-04: We're excited to publish the text of this edition as TXT files.
Content notes
Plain text files
- Page beginnings are indicated numerically in-line as [edition:volume:page].
- Footnotes and marginal notes are included in-line at the point of the siglum, as ^[1. This is note text.]
- Tables are out of scope and indicated in the text with [table].
- Formulas are out of scope and are left uncorrected.
- Further information is available at Editorial Standards.
TEI files
- Index terms for the entry content are in the
<profileDesc>
section of the<teiHeader>
. Each includes a prefaced URI for the named authority file. See Master Files for further information. - Page breaks include a prefaced URI for the online source image. URI resolves to full URL when output to display formats.
- Footnotes and marginal notes are included in-line at the point of the siglum.
- Tables are out of scope and are left uncorrected.
- Formulas are out of scope and are left uncorrected.
- Further information is available at Editorial Standards.
Storage format
- Files. Entries are in individual files with a header for the Knowledge Project. They can be individually downloaded from the GitHub repository
- ZIP file. To easily download the complete edition in either TXT or XML formats, use the ZIP file(s).
- Directories. Files are organized in directories named for the letter of the
entry + the volume number of the print edition. Note:For example, the directory j12 contains all entries in volume 12 that begin with the letter 'J'.
- File names are meaningful. Example:
kp-eb0708-012301-2345-v1.txt
- kp = Knowledge Project
- eb0708 = Encyclopedia Britannica, 7th ed., print vol. 8
- 012301 = print page 123, 1st entry on the page
- 2345 = last 4 digits of the source image file name (makes file names unique)
- v1 = version 1
- This work is licensed under a Creative Commons Attribution 4.0 International License.