Skip to content

Commit

Permalink
Add docs site for archives.getmyvax.org (#1570)
Browse files Browse the repository at this point in the history
We’ve needed to have some docs around https://archives.getmyvax.org for a while and never gotten around to it. Now that the API and its accompanying docs are disappearing, This adds some docs that live directly on `archives.getmyvax.org` (so whatever else happens to other sites, the docs stay alongside the archive data).

The docs page got quite long once I’d pasted in all the schemas and sources and ID systems and so on, so I split it up into a few pages and used MkDocs to generate a site from the markdown files.

All the source for the docs site is in the `archives` directory. There’s a workflow to build the site and upload it to the `/docs` folder in the `univaf-data-snapshots` S3 bucket where we serve the archives site from.

Part of #1550.
  • Loading branch information
Mr0grog committed Jun 16, 2023
1 parent 65bb372 commit 31c2e0a
Show file tree
Hide file tree
Showing 16 changed files with 1,043 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
on:
pull_request:
paths-ignore:
- "archives/**"
- "docs/**"
- "terraform/**"
- "tombstone/**"
Expand All @@ -9,6 +10,7 @@ on:
branches:
- main
paths-ignore:
- "archives/**"
- "docs/**"
- "terraform/**"
- "tombstone/**"
Expand Down
75 changes: 75 additions & 0 deletions .github/workflows/deploy-archives-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
name: Deploy Archives Docs

on:
pull_request:
paths:
- "archives/**"
push:
branches:
- main
paths:
- "archives/**"

workflow_dispatch: {}

permissions:
contents: read
pages: write
id-token: write

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Install Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
cache: "pip"

# No venv here: caching is easier, the environment is ephemeral anyway.
- name: Install dependencies
run: |
cd archives
pip install -r requirements.txt
- name: Build Archive Docs
run: |
cd archives
mkdocs build
# Combine with index redirect page
mkdir site
mv dist site/docs
cp index.html site/
- uses: actions/upload-artifact@v3
with:
name: archives-docs
path: archives/site/

deploy:
if: github.ref == 'refs/heads/main'
needs:
- build
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@v3
with:
name: archives-docs

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.ARCHIVE_DOCS_AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.ARCHIVE_DOCS_AWS_SECRET_ACCESS_KEY }}
aws-region: us-west-2

- name: Copy files to S3
env:
ARCHIVE_BUCKET: univaf-data-snapshots
run: |
aws s3 cp index.html "s3://${ARCHIVE_BUCKET}/index.html"
aws s3 sync docs "s3://${ARCHIVE_BUCKET}/docs/"
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ lib-cov

# Dependency directory
node_modules
.venv

# Editors
.idea
Expand Down
32 changes: 32 additions & 0 deletions archives/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Historical Archives Documentation

This directory hosts the source code for public documentation about UNIVAF's historical archives, which live at https://archives.getmyvax.org/. The docs site is built with [MkDocs](https://www.mkdocs.org/) and the [Material Theme](https://squidfunk.github.io/mkdocs-material/), and get published to `archives.getmyvax.org/docs/` (so they are clearly separated from the actual archive data).

There is also a `index.html` that redirects from `/` to `/docs/` to send browsers that visit `archives.getmyvax.org/` to the docs.


## Setup

1. Make sure you have a recent version of Python 3 (MkDocs is Python-based).

2. Run `./setup.sh` to set up a Python virtual environment and install the dependencies.

This will set up the virtual environment in a `.venv` folder inside this folder, then use `pip` to install the dependencies from `requirements.txt`.

3. To use MkDocs, run `./run-mkdocs.sh <your> <args> <here>`.

- To run the development server: `./run-mkdocs.sh serve`
- To create a static build: `./run-mkdocs.sh build`

You can also activate the virtual environment and run mkdocs directly instead of using the helper:

```bash
# Activate the Python virtual environment:
source ./.venv/bin/activate

# Run mkdocs:
mkdocs serve

# When you're done, deactivate the virtual environment:
deactivate
```
63 changes: 63 additions & 0 deletions archives/docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# UNIVAF Historical Data Archives

This site hosts historical data from UNIVAF, U.S. Digital Response’s COVID-19 Vaccine Appointment Finder API, <https://getmyvax.org/>. **The API is no longer live and was shut down on June 15, 2023.**

The historical data in this archive includes:

- A daily copy of each of the three main tables in the database (starting on June 3, 2021 and ending on June 15, 2023).
- A copy of every update to a location’s availability, grouped into one file for each day (starting on May 19, 2021 and ending on June 15, 2023).
- A final backup of the database in Postgres SQL format (June 16, 2023).
- A final backup of the database in SQLite format (June 16, 2023).

Keep in mind that, since these are historical archives, the format of data has changed over time and data files from different dates may contain different fields. Historical service outages and incidents also impact the data on some days.

Also note that UNIVAF began operation in March 2023, but did not start archiving historical data until May.

For an example of analyzing this data, see <https://github.com/usdigitalresponse/appointment-data-insights>.


## Loading Data Files

Except for the final backups, all files are stored as gzipped, [newline-delimited JSON (NDJSON)](http://ndjson.org/) files.


### Database Copies

Daily copies of the `provider_locations`, `external_ids`, and `availability` tables are stored in a separate directory for each table, and a separate file for each day. Files are named like:

```
https://archives.getmyvax.org/<table>/<table>-<date>.ndjson.gz
```

For example, for the contents of the `provider_locations` table on October 1, 2021, download:

```
https://archives.getmyvax.org/provider_locations/provider_locations-2021-11-01.ndjson.gz
```

Each record in the table is a separate JSON line in the file.


### Availability Update Logs

In addition to daily copies of the database, you can access lists of every single update to a location’s availability in the `/availability_log` directory. Updates are grouped by day, and files are named like:

```
https://archives.getmyvax.org/availability_log/availability_log-<date>.ndjson.gz
```

For example, to get every update on October 1, 2021, download:

```
https://archives.getmyvax.org/availability_log/availability_log-2021-11-01.ndjson.gz
```

Each update is a separate line in the file. The schema of each record is the same as the `availability` table, but in most cases, _only the fields that changed in that update are filled in_. To get a complete picture of the availability of a location at a given time, you will need to scan backwards in time through the availability logs to find the last complete record for the given source and location ID.


### Final Database Backups

A final copy of the database after the service stopped updating is available in Postgres-compatible SQL format and as a SQLite 3 file. Both are gzipped:

- Postgres: `https://archives.getmyvax.org/sql/univaf_postgres_dump-2023-06-16.sql.gz`
- SQLite: `https://archives.getmyvax.org/sql/univaf_sqlite-2023-06-16.sqlite3.gz`
Loading

0 comments on commit 31c2e0a

Please sign in to comment.