Generating the Manifest
The manifest stores all of the parameters for HIVE
HIVE2 needs to know which vocabularies to use for each file and what the parameters are for the vocabulary: minimum word count, for example. All of this information is stored in a csv file that includes the names of the all the files to be processed. This is called the “manifest” for HIVE. We create it with a Python script.
-
Run auto-manifest.py on the batch folder.
- Provide the edition number (1 or 2 digits) of the batch folder when requested.
- Provide the letter name when requested.
-
The script creates a manifest file and saves it to the batch folder. It names
the file manifest_ + the letter name.
manifest_A.csv
-
The batch folder now contains three files for each entry and a single manifest
file for the batch.
each entry full text kp*.txt each entry NER Topics kp*a.csv each entry NER Geo kp*b.csv one per folder manifest manifest_*.csv