Skip to content

Merge EPUB Resource Caption and Alternative Text with Resource Metadata

Timothy W Belch edited this page Oct 28, 2019 · 5 revisions

This monograph processing step can be completed by using the following project script:

script/process_monograph [-p <publisher_directory>] update_metadata <monograph_noid> [<monograph_noid>..]

publisher_directory  Directory that contains a specific publisher monographs.
monograph_noid       Monograph NOID

The script will perform the following:

  • Locate the monograph directory (MONOGRAPH_DIR=FULCRUM_DRIVE/<publiser>/<ebook_isbn>_<author_last_name>) found in the publisher directory that is specified either by the -p <publisher_directory> option, or in the current working directory.
  • Locate the monograph resource metadata file (METADATA_CSV=MONOGRAPH_DIR/resources/<ebook_isbn>_<author>.csv).
  • Locate the monograph EPUB (EPUB_FILE=MONOGRAPH_DIR/<ebook_isbn>_<author>.epub).
  • Scan EPUB_FILE and determine a mapping from each resource reference to a resource file. This information is save in the CSV file RESOURCE_MAP=MONOGRAPH_DIR/resource_processing/resource_map.csv.
  • Scan EPUB_FILE and extract the resource caption and alternative text. This information is saved in the CSV file MONOGRAPH_DIR/resource_processing/captions.csv.
  • Load the METADATA_CSV and for each resource, merge the caption and alternative text. Save the modified resource metadata in the file NEW_METADATA_CSV=MONOGRAPH_DIR/resource_processing/<ebook_isbn>_<author>.csv.

After successful completion of the script, perform the following manual steps:

  • Copy METADATA_CSV to the MONOGRAPH_DIR/Archive directory.
  • Replace METADATA_CSV with NEW_METADATA_CSV within the MONOGRAPH_DIR/resources directory.