Skip to content

Merge EPUB Resource Caption and Alternative Text with Resource Metadata

Timothy W Belch edited this page Oct 23, 2019 · 5 revisions

This monograph processing step can be completed by using the following project script:

script/process_monograph [-s <source_root_dir>] update_metadata <monograph_noid> [<monograph_noid>..]

source_root_dir      Root directory for locating monograph files.
monograph_noid       Monograph NOID

The script will perform the following:

  • Locate the monograph directory (MONOGRAPH=FULCRUM_DRIVE/UMP/<ebook_isbn>_<author>) found within the directory either specified by the <source_root_dir> option, or the value of the WINTERBERRY_FULCRUM_UMP_DIR environment variable, or the current working directory.
  • Locate the monograph resource metadata file (METADATA_CSV=MONOGRAPH_DIR/resources/<ebook_isbn>_<author>.csv).
  • Locate the monograph EPUB (EPUB_FILE=MONOGRAPH_DIR/<ebook_isbn>_<author>.epub).
  • Scan EPUB_FILE and determine a mapping from each resource reference to a resource file. This information is save in the CSV file RESOURCE_MAP=MONOGRAPH_DIR/resource_processing/resource_map.csv.
  • Scan EPUB_FILE and extract the resource caption and alternative text. This information is saved in the CSV file MONOGRAPH_DIR/resource_processing/captions.csv.
  • Load the METADATA_CSV and for each resource, merge the caption and alternative text. Save the modified resource metadata in the file MONOGRAPH_DIR/resource_processing/<ebook_isbn>_<author>.csv.