Master Files

After creating entry files, we run an automated metadata generation process to add index terms to each entry.

Attention: This section is no longer maintained and is posted for information only.

Master files combine an entry file with metadata terms appropriate to the content, to assist in search and analytical operations. Filenames remain the same but append m to the stem.

To creating index terms for 113,000 entries, we use an automated process designed after extensive testing to achieve the highest relevancy results. For this process, a Python script runs named entity recognition on a TXT version of the entry. It then runs the same TXT file and the NER results through a vocabulary server, HIVE2, to generate the index terms for the entry. Those terms are then written into the TEI header of the entry, along with links to the online authority file for each term. To shorten these links, they begin with a preface (preceding the colon). The <listPrefixDef> element in the header contains the information needed to expand the preface in output files to a complete URL.

The script output within the TEI header looks like the following.

        <encodingDesc>
            <listPrefixDef>
                <prefixDef ident="fast" matchPattern="(\d+)"
                    replacementPattern="http://id.worldcat.org/fast/$1">
                    <p>In the context of the Knowledge Project, URIs with the prefix "fast:" point
                        to terms in the OCLC FAST (Faceted Application of Subject Terminology)
                        subject heading schema derived from the Library of Congress Subject Headings
                        (LCSH).</p>
                </prefixDef>
                <prefixDef ident="lcsh1910" matchPattern="(\w+)"
                    replacementPattern="https://n2t.net/ark:/99152/$1">
                    <p>In the context of the Knowledge Project, URIs with the prefix "lcsh1910:"
                        point to subject terms in the volume <hi rend="italic">Subject Headings Used
                            in the Dictionary Catalogues of the Library of Congress</hi>, 7 vols.,
                        Library of Congress, 1910.</p>
                </prefixDef>
            </listPrefixDef>
        </encodingDesc>
            <profileDesc>
                <textClass>
                    <keywords scheme="http://id.worldcat.org/fast/">
                        <term ref="fast:993678">Law</term>
                        <term ref="fast:933281">Founding</term>
                        <term ref="fast:1012448">Matter</term>
                        <term ref="fast:987598">Kindness</term>
                        <term ref="fast:1027064">Motion</term>
                        <term ref="fast:825145">Bacon</term>
                    </keywords>
                    <keywords scheme="https://id.cci.drexel.edu/">
                        <term ref="lcsh1910:b4m90280p">Law</term>
                        <term ref="lcsh1910:b47m04609">Power</term>
                        <term ref="lcsh1910:b44j09z32">Founding</term>
                        <term ref="lcsh1910:b4vx0636c">Matter</term>
                        <term ref="lcsh1910:b4x63b58d">Kindness</term>
                    </keywords>
                </textClass>
            </profileDesc></codeblock></example>