Jump to main content
Knowledge Project logo is a lowercase "kp" in white over a solid blue circle
Nineteenth-Century Knowledge Project
  • About
  • Releases
  • Production
  • Reference
Index
  1. Home
  2. Production

    Descriptions of OCR processing, TEI transformation, and metadata creation

  3. Organization

    How to keep hundreds of thousands of files organized.

  4. Repositories

    A guide to the different repositories used to store ocr-project data.

  5. Google Drive

    Used for image files.

  • Production

    Descriptions of OCR processing, TEI transformation, and metadata creation

    • Organization

      How to keep hundreds of thousands of files organized.

      • Edition-Section System

        File organization depends on two basic folder types

      • Folder names

        As the OCR workflow passes through its various stages, production moves into specific folders for each stage. Their names and contents are given below:

      • Repositories

        A guide to the different repositories used to store ocr-project data.

        • Google Drive

          Used for image files.

          • ebnn Repositories

            The eb03, eb07, eb09, and eb11 repositories contain all files necessary to creating textual data from page images of the Encyclopedia Britannica.

          • information Repository

            Contains general information related to the project

          • metainfo Repository

            The metainfo repository

        • GitHub

          Used for data files.

        • archive Repository

          Long-term storage of image files

      • Setting Up the Repositories

        Create local copies of the remote repositories

    • Page Files

      Explains the procedures we use to get the best quality OCR of each page.

    • Entry Files

      Procedures for converting single pages into Encyclopedia entries.

    • Master Files

      After creating entry files, we run an automated metadata generation process to add index terms to each entry.

Google Drive

Used for image files.

Google Drive is used for storing OCR Projects because it is able to manage large image files.

It contains the following primary folders.
folder name description
ebnn The eb03, eb07, eb09, and eb11 repositories contain OCR project files.
information A repository for general information about Encyclopedia Britannica and the Nineteenth-Century Knowledge Project
metainfo Contains information related to the creation of metadata for the Knowledge Project

Contact: nckp@temple.edu