OxCGRT data merge, npi model computation docker deployment #523

JanataPavel · 2020-09-01T12:44:28Z

OxCGRT data merge

Added a luigi step that downloads data from the Oxford COVID Government Response Tracker (OxCGRT) and merges them into the countermeasures data.

The original countermeasures collected for the paper (till the end of May) are in the countermeasures_model_data.csv.

As described in the issue, we want to use new sources of the countermeasures data for the NPI model. So far only the data from OxCGRT are used (as described in the issue). At this moment we still don't have newer sources for the Mask wearing, Univerisites, and the Businesses countermeasures are derived from a related OxCGRT feature, but we might want to replace it in the future.

The data from the paper take priority over the merged OxCGRT data.

At this moment, the countries for which the NPI model is computed are the original 40 countries from the paper. In the near future, we will want to replace this by a list of about 80 countries

NPI model GCS Docker computation

A New GitHub workflow was created, which builds the docker image, uploads it to the container registry on google cloud and then it creates a VM instance on which it runs the newly created image. A script in the container downloads a CSV with R estimates, which we compute every day, and runs the NPI model. When the computation finishes it exports the results and uploads it to google's storage bucket from where it can be used by the FE. After that, the container runs a gcloud command to remove the instance on which it runs.

As the model now won't run before the WebExport task, the results of the NPI model are exported into a separate file.
Every day the upload-data workflow runs, which processes the John Hopkins data, computes the r_estimates, and exports it to the data-v4.json and uploads it to the main channel. The compute-npi-model workflow can be run independently of it, and can be triggered from the github web (only after the workflow file is in the master, but it can be run through the github api now). When running the workflow, the name of the channel to which the results will be uploaded can be specified. The docker container uploads two files to the bucket - latest_npi-results.json and a <date>_npi-results.json. This is so that we can show our past predictions in the future.

Notes on performance

During the development, I've encountered a few performance issues. The original idea was to take advantage of the parallelism of the pymc3 library, which turned up to be problematic. Although the pymc3 library enables running multiple jobs each using multiple threads, in reality, there are no performance gains from using more threads (probable issues with the implementation of pymc3 library itself). It turned out to work best when each chain only uses one thread. As we're using 2 chains a 2-core CPU is enough.

This project is based on poetry as dependency manager. However, while experimenting with the model, I found out, that it actually runs significantly better with Anaconda (do to some more direct interaction of the theano and mkl). Because of this, the docker container in which the model runs uses conda environment.

The conda docker image requires at least 15 GB of disc space on the VM instance. The model needs about 20 GB of memory when running on 40 countries with the current number of countermeasures and 90 extrapolation period. When more countries are added, the RAM will have to be increased

* added a dummy featrues for every intervention which is turned on when a intervention is turned of for the first time and turned of when the original intervention is put in place again. * extended the data for another month by using th last know countermeasures in each region and data from johns hopkins

…updating pip and setuptools

…ntion json

* extrapolation

Prepared docker container which can be run on GCP compute instances from github pipelines defined conda environment to accelerate the computation of the npi model

* don't run previous steps in workflows, instead download the latest r_estimates.csv inside docker. The other steps are quick Created a script which deletes the instance when docker exits. This script is copied to the GCP console, but I included it in the repo for consistency

…ance in the command

…ainer, reformatted code

* OxCGRT added data for subregions (e.g. US states) which broke the pipeline. Fix is to filter it out, but we might use them in the future

…in the env file

* Triggering the npi-model computing workflow manually * More tune interactions of the model (to hopefully shrink the confidence interval) * Made sure that each NUTS sampling process created by pymc3 only uses one thread - The parallelization doesn't work and this greatly speeds up the computation

…merge

lgtm-com · 2020-09-01T12:57:33Z

This pull request introduces 3 alerts when merging 89c422f into 70bd566 - view on LGTM.com

new alerts:

2 for Unused import
1 for Unused local variable

JanataPavel and others added 30 commits August 14, 2020 14:45

fixed workflow yaml

f02699c

fixed workflow yml

13cd069

maybe resolving failing dependency installation in github actions by …

1668cc4

…updating pip and setuptools

fixed the extension of data, removed the cancled columns from interve…

1985199

…ntion json

Introduce new intervention icons

990b620

Fix linting

ddb4265

Show NPI model chart only for model channel

9424092

extrapolating data

14a4ec8

fixed data preprocessing - removing deaths from countermeasures

2750f49

short-term model improvements

5dc053b

* extrapolation

added extrapolation date

5d36d17

Merge OxCGRT countermeasure data

47eb823

poetry update

722f9dd

Merged NPI data from OxCGRT

4a0281f

Prepared docker container which can be run on GCP compute instances from github pipelines defined conda environment to accelerate the computation of the npi model

fixed invalid github workflow yml

3701fc4

add upload data step, removed decrypt secrets to workflow

45a2907

fixed upload data step

0fd5a67

moved GCP setup in workflow, added extrapolation perriod to the model

73df2c7

fixed extrapolation period argument

531ade5

fixed create-with-container command in workflow

1684243

added env file in compute-npi-model to hopefully fix the gcloud command

c6f3317

fixed workflow

4b38a58

fixed workflow

ee15976

set region for compute

cd810fc

use different service account

08df6eb

update gcloud in workflow

9ecf289

changed order of arguments

db405a4

screw instance templates, it just refuses to work - defining the inst…

c25b45c

…ance in the command

JanataPavel added 24 commits August 27, 2020 08:55

dropped the machine type added vm type instead

ff7079c

added scopes to the vm instance, so that it can pull docker image

2f1c622

fixed syntax

818c3b8

building docker container, debugging startup script

77738d7

use url to pass the startup script to the instance

515edda

extracting branch name, refactoring workflow

5a966f2

reformatted workflow yml

1cdb66c

fixed workflow step

97e3dfd

make sure the startup script won't block model

bf4e855

trying it without the startup script

70c5b49

another try to not block the npi model by startup script

084d9e2

more disk-space (conda image is large), debugging startup script

0fa21d5

fixed preprocessing of countermeasures, debugging startup script

d74668a

removed the startup script, killing the instance from the docker cont…

6ee9774

…ainer, reformatted code

fixed linting after black update

28520c7

Filter out subregions from OxCGRT data

925eab1

* OxCGRT added data for subregions (e.g. US states) which broke the pipeline. Fix is to filter it out, but we might use them in the future

fixed key passing to the container

e401898

Strip the quotes from the key - they are necessary when passing them …

c39f794

…in the env file

fixed run model script

2623574

fixed extrapolation date, changed channel, run on 40 countries

ed121e2

fixed lining

fec049b

pre-PR clean-up

7100824

Merge remote-tracking branch 'origin/master' into 518-npi-model-data-…

89c422f

…merge

lgtm based fixes

8da8cd8

witzatom self-requested a review September 11, 2020 11:16

witzatom approved these changes Sep 11, 2020

View reviewed changes

witzatom merged commit e397f2e into master Sep 11, 2020

witzatom deleted the 518-npi-model-data-merge branch September 11, 2020 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OxCGRT data merge, npi model computation docker deployment #523

OxCGRT data merge, npi model computation docker deployment #523

JanataPavel commented Sep 1, 2020

lgtm-com bot commented Sep 1, 2020

OxCGRT data merge, npi model computation docker deployment #523

OxCGRT data merge, npi model computation docker deployment #523

Conversation

JanataPavel commented Sep 1, 2020