Scripts that combine historical emissions data records from several datasets like CEDS and GFED to create complete historical emissions files that are input to the IAM emissions harmonization algorithms in IAMconsortium/concordia
(regional harmonization and spatial gridding for ESMs) and iiasa/climate-assessment
(global climate emulator workflow).
- prototype: the project is just starting up and the code is all prototype
We do all our environment management using poetry. To get started, you will need to make sure that poetry is installed (instructions here, we found that pipx and pip worked better to install on a Mac).
You may need to upgrade poetry
if errors occur, as was the case e.g., here.
To create the virtual environment, run
# Tell poetry to put virtual environments in the project
poetry config virtualenvs.in-project true
poetry install --all-extras
poetry run pre-commit install
These steps are also captured in the Makefile
so if you want a single
command, you can instead simply run make virtual-enviroment
.
Having installed your virtual environment, you can now run commands in your virtual environment using
poetry run <command>
For example, to run Python within the virtual environment, run
poetry run python
As another example, to run a notebook server, run
poetry run jupyter lab
Note that this repository focuses on processing data, and does not currently also (re)host input data files.
Files that need to be downloaded to make sure you can run the notebooks are specified in the relevant data
subfolders, in README files, such as in \data\national\ceds\data_raw\README.txt
for the CEDS data download, and in \data\national\gfed\data_raw\README.txt
for the GFED data download.
Data is processed by the jupyter notebooks (saved as .py scripts using jupytext, under the notebooks
folder).
The output paths are generally specified at the beginning of each notebook.
For instance, you find processed CEDS data at \data\national\ceds\processed
and processed GFED data at \data\national\gfed\processed
.
Install and run instructions are the same as the above (this is a simple repository, without tests etc. so there are no development-only dependencies).
General functions in emissions_harmonization_historical
.
Data: big files, locally, in data
, especially under the data_raw
subfolders.
Structured in national
(e.g., CEDS, GFED) and global
(e.g., GCB) folders.
Notebooks: these are the main processing scripts.
01**
: preparing input data for IAMconsortium/concordia
.
02**
: preparing input data for iiasa/climate-assessment
.
In this repository, we use the following tools:
- git for version-control (for more on version control, see
general principles: version control)
- for these purposes, git is a great version-control system so we don't complicate things any further. For an introduction to Git, see this introduction from Software Carpentry.
- Poetry for environment management
(for more on environment management, see
general principles: environment management)
- there are lots of environment management systems. Poetry works and for simple projects like this there is no need to overcomplicate things
- we track the
poetry.lock
file so that the environment is completely reproducible on other machines or by other people (e.g. if you want a colleague to take a look at what you've done)
- pre-commit with some very basic settings to get some
easy wins in terms of maintenance, specifically:
- code formatting with ruff
- basic file checks (removing unneeded whitespace, not committing large files etc.)
- (for more thoughts on the usefulness of pre-commit, see general principles: automation
- track your notebooks using
jupytext
(for more thoughts on the usefulness of Jupytext, see
tips and tricks: Jupytext)
- this avoids nasty merge conflicts and incomprehensible diffs