-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OxCGRT data merge, npi model computation docker deployment #523
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* added a dummy featrues for every intervention which is turned on when a intervention is turned of for the first time and turned of when the original intervention is put in place again. * extended the data for another month by using th last know countermeasures in each region and data from johns hopkins
…updating pip and setuptools
* extrapolation
Prepared docker container which can be run on GCP compute instances from github pipelines defined conda environment to accelerate the computation of the npi model
* don't run previous steps in workflows, instead download the latest r_estimates.csv inside docker. The other steps are quick Created a script which deletes the instance when docker exits. This script is copied to the GCP console, but I included it in the repo for consistency
…ance in the command
…ainer, reformatted code
* OxCGRT added data for subregions (e.g. US states) which broke the pipeline. Fix is to filter it out, but we might use them in the future
* Triggering the npi-model computing workflow manually * More tune interactions of the model (to hopefully shrink the confidence interval) * Made sure that each NUTS sampling process created by pymc3 only uses one thread - The parallelization doesn't work and this greatly speeds up the computation
This pull request introduces 3 alerts when merging 89c422f into 70bd566 - view on LGTM.com new alerts:
|
witzatom
approved these changes
Sep 11, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
OxCGRT data merge
Added a luigi step that downloads data from the Oxford COVID Government Response Tracker (OxCGRT) and merges them into the countermeasures data.
The original countermeasures collected for the paper (till the end of May) are in the
countermeasures_model_data.csv
.As described in the issue, we want to use new sources of the countermeasures data for the NPI model. So far only the data from OxCGRT are used (as described in the issue). At this moment we still don't have newer sources for the Mask wearing, Univerisites, and the Businesses countermeasures are derived from a related OxCGRT feature, but we might want to replace it in the future.
The data from the paper take priority over the merged OxCGRT data.
At this moment, the countries for which the NPI model is computed are the original 40 countries from the paper. In the near future, we will want to replace this by a list of about 80 countries
NPI model GCS Docker computation
A New GitHub workflow was created, which builds the docker image, uploads it to the container registry on google cloud and then it creates a VM instance on which it runs the newly created image. A script in the container downloads a CSV with R estimates, which we compute every day, and runs the NPI model. When the computation finishes it exports the results and uploads it to google's storage bucket from where it can be used by the FE. After that, the container runs a gcloud command to remove the instance on which it runs.
As the model now won't run before the
WebExport
task, the results of the NPI model are exported into a separate file.Every day the upload-data workflow runs, which processes the John Hopkins data, computes the r_estimates, and exports it to the data-v4.json and uploads it to the main channel. The compute-npi-model workflow can be run independently of it, and can be triggered from the github web (only after the workflow file is in the master, but it can be run through the github api now). When running the workflow, the name of the channel to which the results will be uploaded can be specified. The docker container uploads two files to the bucket -
latest_npi-results.json
and a<date>_npi-results.json
. This is so that we can show our past predictions in the future.Notes on performance
During the development, I've encountered a few performance issues. The original idea was to take advantage of the parallelism of the pymc3 library, which turned up to be problematic. Although the pymc3 library enables running multiple jobs each using multiple threads, in reality, there are no performance gains from using more threads (probable issues with the implementation of pymc3 library itself). It turned out to work best when each chain only uses one thread. As we're using 2 chains a 2-core CPU is enough.
This project is based on poetry as dependency manager. However, while experimenting with the model, I found out, that it actually runs significantly better with Anaconda (do to some more direct interaction of the theano and mkl). Because of this, the docker container in which the model runs uses conda environment.
The conda docker image requires at least 15 GB of disc space on the VM instance. The model needs about 20 GB of memory when running on 40 countries with the current number of countermeasures and 90 extrapolation period. When more countries are added, the RAM will have to be increased