Weighting a tree by a tree of dataarrays isn't possible #193

TomNicholas · 2023-01-10T06:53:24Z

ds.weighted(da).mean() can't be easily generalised to involve two trees, because there is no tree version of a dataarray. This is one reason why I haven't tried to write .weighted for datatree yet.

We should be able to generalise an operation involving multiple datasets such as ds.weighted(ds).mean(), i.e. dt.weighted(dt).mean() (as it's equivalent to dt + dt from map_over_subtree's perspective), but this particular weighted operation doesn't actually make sense.

We could also do dt.weighted(da).mean(), but I don't see how to make something like dt.weighted(dt).mean() work nicely.

The problem is that you can't weight by multiple variables, but at the moment you can't extract one variable from a node and retain the tree structure.

You could do this though, this is well-defined:

def weighted_mean_da(ds):
    return ds.data.weighted(ds.var).mean()

dt.map_over_subtree(weighted_mean_da)

cc @jbusecke

The text was updated successfully, but these errors were encountered:

jbusecke · 2023-01-10T14:54:32Z

In the end I think it is actually desirable to nudge the user to store the weights as a variable/coordinate in the dataset! I think that will avoid errors when doing something like a weighted mean over many different nodes.

TomNicholas · 2023-01-10T18:55:23Z

In the end I think it is actually desirable to nudge the user to store the weights as a variable/coordinate in the dataset! I think that will avoid errors when doing something like a weighted mean over many different nodes.

Why exactly? That is not ideal because that's not what xarray's Dataset.weighted currently expects, so the API would be inconsistent.

jbusecke · 2023-01-16T16:17:15Z

Ah ok I see the point, my argument was purely based on practical considerations. If you have assigned the weights as a coordinate already, you presumably already dealt with alignment and other possible issues. But that previous step would run into the same issues (e.g. there would be some dt.assign_coords() step that also expects one or more dataarrays as input.

That is not ideal because that's not what xarray's Dataset.weighted currently expects

I do not follow this particular argument fully? You are saying that something like this:

res = ds.weighted(ds.some_var).mean()

is somehow not recommended?

TomNicholas added the design question label Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weighting a tree by a tree of dataarrays isn't possible #193

Weighting a tree by a tree of dataarrays isn't possible #193

TomNicholas commented Jan 10, 2023 •

edited

Loading

jbusecke commented Jan 10, 2023

TomNicholas commented Jan 10, 2023

jbusecke commented Jan 16, 2023

Weighting a tree by a tree of dataarrays isn't possible #193

Weighting a tree by a tree of dataarrays isn't possible #193

Comments

TomNicholas commented Jan 10, 2023 • edited Loading

jbusecke commented Jan 10, 2023

TomNicholas commented Jan 10, 2023

jbusecke commented Jan 16, 2023

TomNicholas commented Jan 10, 2023 •

edited

Loading