Jump to main content
Knowledge Project logo is a lowercase "kp" in white over a solid blue circle
Nineteenth-Century Knowledge Project
  • About
  • Releases
  • Production
  • Reference
Index
  1. Home
  2. Production

    Descriptions of OCR processing, TEI transformation, and metadata creation

  3. Organization

    How to keep hundreds of thousands of files organized.

  4. Repositories

    A guide to the different repositories used to store ocr-project data.

  5. GitHub

    Used for data files.

  6. outputs Repository

    Explains the content of the outputs repository.

  7. master Folder

    Contains the master files for creating digital editions.

  • Production

    Descriptions of OCR processing, TEI transformation, and metadata creation

    • Organization

      How to keep hundreds of thousands of files organized.

      • Edition-Section System

        File organization depends on two basic folder types

      • Folder names

        As the OCR workflow passes through its various stages, production moves into specific folders for each stage. Their names and contents are given below:

      • Repositories

        A guide to the different repositories used to store ocr-project data.

        • Google Drive

          Used for image files.

        • GitHub

          Used for data files.

          • outputs Repository

            Explains the content of the outputs repository.

            • autoindex folder

              Collects all materials needed for indexing entry files.

            • code Folder

              The central repository for program code and transformation scripts.

            • digital-editions Folder

              Storage area for editions generated from the master files.

            • entry Folder

              Contains the TEI-encoded data after the page files are converted to entry files.

            • master Folder

              Contains the master files for creating digital editions.

            • metadata Folder

              A collection of files containing metadata for each entry page.

            • page Folder

              Pages are individual printed pages in the Encyclopedia.

            • records Folder

              A collection of spreadsheets and other documents recording details of the production and analytical work.

        • archive Repository

          Long-term storage of image files

      • Setting Up the Repositories

        Create local copies of the remote repositories

    • Page Files

      Explains the procedures we use to get the best quality OCR of each page.

    • Entry Files

      Procedures for converting single pages into Encyclopedia entries.

    • Master Files

      After creating entry files, we run an automated metadata generation process to add index terms to each entry.

master Folder

Contains the master files for creating digital editions.

Creating master file is the final goal of production in the Nineteenth-Century Knowledge Project. Each file combines the text of an entry with its individual index terms. We call them master files because we use them to generate different digital editions of the data in any desired format, such as html or TXT. This folder matches the structure of the entry Folder.

Contact: nckp@temple.edu