Skip to content

A database of persons included in Office of the Historian publications and datasets

Notifications You must be signed in to change notification settings

HistoryAtState/people

Repository files navigation

People

A database of persons included in Office of the Historian publications and datasets. The application provides a searchable list of all entries and an OpenRefine Reconciliation Service.

Data sources

Status

The data and app are in early beta. Caveat: Data identifiers are subject to change.

Dependencies

  • The data in the data collection is XML
  • The application runs in eXist-db. Requires 3.0RC2.
  • Building the installable package requires Apache Ant
  • The OpenRefine Reconciliation Service targets OpenRefine 2.6 Beta

Installation

URL structure

  • Individual records are stored in /people/id/{PERSON_IDENTIFIER}, where {PERSON_IDENTIFIER} is a numerical ID.
  • The default view is HTML.
  • The source XML data for a record can be viewed by appending .xml to the URL, i.e., /people/id/{PERSON_IDENTIFIER}.xml

Data size and organization

The initial size of the dataset is ~16,000 person records. Each person record is assigned a numerical ID. In the mid-to-long term, data will approach and eventually exceed 100,000 records.

The numerical IDs assigned to person records begin with 100,001. (Starting with 100,001 instead of 1 helps with sorting in integer-ignorant environments, which treat 10 as coming before 2.)

For performance considerations in our environments (git, filesystem, and eXist-db database), we limit the number of files in a directory to 100. To facilitate this, we use a directory structure as follows:

Record 100001 is stored in: 100000/1/0/0/0. This directory contains 100001.xml through 100099.xml.

Record 113400 is stored in: 100000/1/1/3/4. This directory contains 113400.xml through 113499.xml.

About

A database of persons included in Office of the Historian publications and datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages