Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighting a tree by a tree of dataarrays isn't possible #193

Open
TomNicholas opened this issue Jan 10, 2023 · 3 comments
Open

Weighting a tree by a tree of dataarrays isn't possible #193

TomNicholas opened this issue Jan 10, 2023 · 3 comments

Comments

@TomNicholas
Copy link
Collaborator

TomNicholas commented Jan 10, 2023

ds.weighted(da).mean() can't be easily generalised to involve two trees, because there is no tree version of a dataarray. This is one reason why I haven't tried to write .weighted for datatree yet.

We should be able to generalise an operation involving multiple datasets such as ds.weighted(ds).mean(), i.e. dt.weighted(dt).mean() (as it's equivalent to dt + dt from map_over_subtree's perspective), but this particular weighted operation doesn't actually make sense.

We could also do dt.weighted(da).mean(), but I don't see how to make something like dt.weighted(dt).mean() work nicely.

The problem is that you can't weight by multiple variables, but at the moment you can't extract one variable from a node and retain the tree structure.

You could do this though, this is well-defined:

def weighted_mean_da(ds):
    return ds.data.weighted(ds.var).mean()

dt.map_over_subtree(weighted_mean_da)

cc @jbusecke

@jbusecke
Copy link
Contributor

In the end I think it is actually desirable to nudge the user to store the weights as a variable/coordinate in the dataset! I think that will avoid errors when doing something like a weighted mean over many different nodes.

@TomNicholas
Copy link
Collaborator Author

In the end I think it is actually desirable to nudge the user to store the weights as a variable/coordinate in the dataset! I think that will avoid errors when doing something like a weighted mean over many different nodes.

Why exactly? That is not ideal because that's not what xarray's Dataset.weighted currently expects, so the API would be inconsistent.

@jbusecke
Copy link
Contributor

Ah ok I see the point, my argument was purely based on practical considerations. If you have assigned the weights as a coordinate already, you presumably already dealt with alignment and other possible issues. But that previous step would run into the same issues (e.g. there would be some dt.assign_coords() step that also expects one or more dataarrays as input.

That is not ideal because that's not what xarray's Dataset.weighted currently expects

I do not follow this particular argument fully? You are saying that something like this:

res = ds.weighted(ds.some_var).mean()

is somehow not recommended?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants