Master Files
After creating entry files, we run an automated metadata generation process to add index terms to each entry.
Master files combine an entry file with metadata
terms appropriate to the content, to assist in search and analytical operations.
Filenames remain the same but append m
to the stem.
To creating index terms for 113,000 entries, we use an automated process designed after
extensive testing to achieve the highest relevancy results. For this process, a Python script runs named entity recognition on a TXT
version of the entry. It then runs the same TXT file and the NER results through a
vocabulary server, HIVE2, to generate the index terms for the entry.
Those terms are then written into the TEI header of the entry, along
with links to the online authority file for each term. To shorten these links, they
begin with a preface (preceding the colon). The <listPrefixDef>
element in the header contains the information needed to expand the preface in output
files to a complete URL.
The script output within the TEI header looks like the following.
<encodingDesc>
<listPrefixDef>
<prefixDef ident="fast" matchPattern="(\d+)"
replacementPattern="http://id.worldcat.org/fast/$1">
<p>In the context of the Knowledge Project, URIs with the prefix "fast:" point
to terms in the OCLC FAST (Faceted Application of Subject Terminology)
subject heading schema derived from the Library of Congress Subject Headings
(LCSH).</p>
</prefixDef>
<prefixDef ident="lcsh1910" matchPattern="(\w+)"
replacementPattern="https://n2t.net/ark:/99152/$1">
<p>In the context of the Knowledge Project, URIs with the prefix "lcsh1910:"
point to subject terms in the volume <hi rend="italic">Subject Headings Used
in the Dictionary Catalogues of the Library of Congress</hi>, 7 vols.,
Library of Congress, 1910.</p>
</prefixDef>
</listPrefixDef>
</encodingDesc>
<profileDesc>
<textClass>
<keywords scheme="http://id.worldcat.org/fast/">
<term ref="fast:993678">Law</term>
<term ref="fast:933281">Founding</term>
<term ref="fast:1012448">Matter</term>
<term ref="fast:987598">Kindness</term>
<term ref="fast:1027064">Motion</term>
<term ref="fast:825145">Bacon</term>
</keywords>
<keywords scheme="https://id.cci.drexel.edu/">
<term ref="lcsh1910:b4m90280p">Law</term>
<term ref="lcsh1910:b47m04609">Power</term>
<term ref="lcsh1910:b44j09z32">Founding</term>
<term ref="lcsh1910:b4vx0636c">Matter</term>
<term ref="lcsh1910:b4x63b58d">Kindness</term>
</keywords>
</textClass>
</profileDesc></codeblock></example>