Skip to content

Project Description: Phase 1

Amanda Ross edited this page May 17, 2018 · 24 revisions

Phase 1 Summary

  • Timeframe
    • October 2016 to September 2017
  • Staff
    • Metadata Specialist: Amanda Ross (@WaxCylinderRevival)
    • Search Prototype Designer: Joe Wicentowski (@joewiz)
    • Backlog Publishers: Virginia Kinniburgh, Stephanie Eckroth
  • By the Numbers
    • Between October 21, 2016* and August 16, 2017, we've added at least one dateline to 49,916 historical documents (a 29.93% increase).
    • As of August 30, 2017, 99.55% of historical documents (excluding attachments) in the FRUS digital archive have at least one dated dateline.
Category October 12, 2016 August 16, 2017 Change [August 30, 2017]
Total Documents 192,930 233,684 +40,754 documents (21.12% increase) ---
I. Editorial Notes 7,765 (4.02%) 7,916 (3.39%) +151 editorial notes (0.02% increase) ---
II. Historical Documents 185,165 (95.98%) 225,768 (96.61%) +40,603 historical documents (21.93% increase) ---
IIa. Historical Documents w/at least 1 dateline 166,774 (90.00%) 216,690 (95.98%) +49,916 (29.93% increase) ---
IIb. Historical Documents (excluding attachments) w/at least 1 dateline --- --- --- [225,113 (99.71%)]
IIc. Historical Documents (excluding attachments) w/at least 1 dateline//date --- --- --- [224,751 (99.55%)]

[* October 21, 2016 is the date of the first FRUS-dates-project commit. October 12, 2016 is the date of the first query-based analysis of the FRUS corpus.]

Project Brief

As of August 2017, the majority of non-editorial note documents have now been given a date/date range, including those “undated” by FRUS compilers past.

  • We attempted to establish the most discrete date/date range per document, using:

    • Document header
    • Document content
    • Chapter or subchapter headings with dates/dateTimes
    • Dates of sibling documents within the same chapter or subchapter
    • Outside research
    • Logical rules
    • Volume date spans
  • We used the same clues to generate date ranges for imprecise dates such as “April 1976”

  • We also identified non-Gregorian dates within the text, declared the original calendar used, and converted to Gregorian/UTC. Examples include:

<dateline>Dated the <date when="1947-10-09" calendar="tibetan-phugpa"
  >25th of the 8th month of Tibetan Fire-Pig Year [1947]</date>.</dateline>

<dateline>
<date when="1865-06-18" ana="#date_undated-inferred-from-document-content" 
  calendar="masonic-anno-lucis">24th day of the 3d month, 
  in the year of light 5865</date>
</dateline>
  • Each date touched has received an @ana tag alerting editors to the reason/source behind the date range, in order to maintain an analytical history behind machine-readable date assignment. Examples include:

    • #date_apparent-typo-based-on-document-content
    • #date_apparent-typo-based-on-document-scan
    • #date_apparent-typo-based-on-outside-research
    • #date_editorial-correction
    • #date_imprecise-inferred-from-date-rules
    • #date_imprecise-inferred-from-document-content
    • #date_imprecise-inferred-from-document-scan
    • #date_imprecise-inferred-from-document-content-and-sibling-dates
    • #date_imprecise-inferred-from-outside-research
    • #date_imprecise-inferred-from-sibling-dates
    • #date_undated-inferred-from-chapter-heading
    • #date_undated-inferred-from-document-content
    • #date_undated-inferred-from-document-content-and-sibling-dates
    • #date_undated-inferred-from-document-head
    • #date_undated-inferred-from-document-scan
    • #date_undated-inferred-from-sibling-dates
    • #date_undated-inferred-from-outside-research
  • These date/date ranges and the @ana reasoning can be revised/updated as needed.

  • We leveraged placeName + date to add appropriate time zone adjustments, when needed.

    • We relied on https://www.timeanddate.com/ to identify appropriate historic time zones, which have shifted greatly throughout the FRUS publication span.
  • From there, we took the values of the manually established @when | @from, @to | @notBefore, @notAfter attributes to devise a minimum dateTime (div/@frus:doc-dateTime-min) and maximum dateTime (div/@frus:doc-dateTime-ax) for each document. The search prototype works on the div/@frus:doc-dateTime-mindiv/@frus:doc-dateTime-max range.

  • The documents should appear in the chronologically sort by the first day of their estimated or known date/date range. Where time is not known, they are sorted as being at 12:00 a.m. of that day.

(For more on completed work and future development, please visit Issue Tracking)


Previous: Introduction | Next: Phase 2