Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom shapes #20

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Allow custom shapes #20

wants to merge 6 commits into from

Conversation

brynpickering
Copy link
Member

@brynpickering brynpickering commented Apr 29, 2021

This addresses an issue concerning the upstream dependency of euro-calliope on the output of this workflow. Namely, the introduction of custom shapes. I've gone for quite a simple approach:

  1. You can define a new value that isn't related to nuts, gadm, or lau in the layer config (e.g. my-custom-shapes)
  2. You can define a location for the shapefile for that new value in data-sources by using the same string name (my-custom-shapes: path_to_shapefile.geojson)
  3. The rule will then build administrative_borders.gpkg based on all data sources requested from config. This allows it to add in new layers for any custom shapes and to avoid building a layer entirely if not requested by a layer in config. For example, one could remove the need to consider lau units by not having the municipal layer in their config.

This has led to injecting a python script into the snakemake rules, which has then led me to restructure the renewablepotentialslib package to handle that in a more lightweight way.

Two potentially brittle parts:

  • it expects any trailing digits on the data source description to be indicative of the layer in the data source (e.g. nuts2 -> nuts, but also it will think of my-custom-shapes999 as referring to data source my-custom-shapes). I think this is fine, as the process will fail if it removes trailing digits and then can't find a corresponding data source.
  • it expects the user-input shapefile to correspond to the layer schema (below) + a level column, referring to either the data source name (e.g. my-custom-shapes) or level number (e.g. 2 for my-custom-shapes2). The level part could be optional, to default to the source name if not in the geodataframe.
{
    "properties": {
        "country_code": "str", # id of the country to which the unit belongs
        "id": "str", # a unique id of this unit
        "name": "str", # the name of the unit, not necessarily unique
        "type": "str", # the type of the unit
        "proper": "bool" # flag indicating proper administrative unit (not the case for water bodies e.g.)
    },
    "geometry": "MultiPolygon"
}

Copy link
Member

@timtroendle timtroendle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the general approach.

I have mainly one consequential comment about your implementation. Have you considered keeping the new function in the Snakefiles? I would not have it live in the lib, but rather in the Snakefiles. Here's why:

  • It is strongly linked to the workflow.
  • Having it in the lib requires you to install the lib in the base env.
  • It's really small and does not depend on anything else in the lib.

The only downside would be that it's more difficult (or impossible?) to test. But I think the pros outweigh this con.

type: string
enum:
- gadm0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pity. Not really bad though.

environment.yaml Outdated
@@ -6,3 +6,6 @@ dependencies:
- python=3.6
- snakemake-minimal=5.4
- pycountry=20.7.3
- pip
- pip:
- -e ./lib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lib requires numpy and so you need list numpy here otherwise it's going to be pip-installed. Also, pinning the version number would be good.

@@ -6,7 +6,7 @@
import pandas as pd
from pandas.testing import assert_frame_equal

from renewablepotentialslib.conversion import (
from renewablepotentialslib.geo.conversion import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to rename the parent folder of this test to geo -- the structure above lib/tests/ should resemble the structure within potentialslib in my opinion.

@@ -0,0 +1,10 @@
def collect_shape_dirs(config, rules):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being in the public API of the lib, this function should have a proper docstring.

@@ -0,0 +1,10 @@
def collect_shape_dirs(config, rules):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may not like this comment: I don't think this function should be in the lib. This is for three reasons:

  1. It is strongly linked to the workflow. Not only does it use the Snakemake workflow object rules, but also it expects certain rules to exist in the workflow.
  2. Having it in the lib requires you do install the lib in the base env which I think we should keep as clean as possible because it's not controlled by Snakemake itself. Most of the changes in this PR are only necessary because of that.
  3. It's really small and does not depend on anything else in the lib.

For these reasons, I believe the function should live in data-preprocessing.smk rather than the lib.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's where I first put it, for all the reasons you give above. However, I can't test it when it sits in the middle of the Snakefile, I assume... Perhaps a 'rule_utils.py' file in the rules dir?

environment.yaml Outdated
@@ -6,3 +6,6 @@ dependencies:
- python=3.6
- snakemake-minimal=5.4
- pycountry=20.7.3
- pip
- pip:
- -e ./lib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in favour of installing the lib into the base env, as this env should be as lean as possible.

@@ -47,11 +45,10 @@ def normalise_admin_borders(path_to_nuts, path_to_gadm, path_to_lau, crs, scope_


if __name__ == "__main__":
shape_dirs = {k: v for k, v in snakemake.input.items() if k != "src"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems brittle. Isn't it possible to give the whole thing a name?

So instead of {"nuts": something_with_nuts} this whole thing would be {"shapes": {"nuts": something_with_nuts}}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that snakemake can track inputs that are defined inside a nested dictionary...

for key in ["nuts", "gadm"]:
assert collected[key] == f"you_got_{key}"

def test_will_fail(self, rules):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give this test a more self-explanatory name? I don't understand why this is failing. Is it because of the dashes?

@brynpickering
Copy link
Member Author

I have now moved this function out of the lib and it sits instead in with the snakemake rules in a specific util file. I have updated the testing mechanism to be able to cover this, at least within the snakemake test rule. This repo could probably restructure according to calliope-project/euro-calliope#120 in the longer run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants