Issue #134 merge partitions with inconsistent grids amongst partitions #216

JoerivanEngelen · 2024-02-13T16:01:46Z

Add support for merging partitions with different grids in each partition, for example 1D grid in one partition and 2D grids in each. Quite some refactoring was required:

groups grids, data_objects, and other_vars by gridname
all partitions with a certain grid should have the same variables, this is validated
connectivity dims are padded to maximum size if not consistent
add property to AbstractUgrid to get connectivity dimension names

… todos

* Add ``max_face_node_dimension`` property for Ugrid2D * Add ``max_connectivity_dimensions`` property

* Add function to maybe pad connectivity max dims * Group data objects and other vars by gridname

…are added to merged

veenstrajelmer · 2024-02-14T08:22:58Z

Tested with both examples from #134 and #86 and they both work. However, I do notice odd behaviour. When using xu.open_mfdataset the merging takes 0.3 seconds. When using xu.open_dataset the merging takes >260 seconds:

import glob
import xugrid as xu
import datetime as dt

file_nc = r'p:\1230882-emodnet_hrsm\GTSMv5.0\SO_NHrivGTSM\computations\BD014_fix_mapformat4\output\gtsm_model_0*_map.nc'
file_nc_list = glob.glob(file_nc)
partitions = []
for iF, file_nc_one in enumerate(file_nc_list):
    uds_part = xu.open_mfdataset(file_nc_one)
    partitions.append(uds_part)
dtstart = dt.datetime.now()
uds = xu.merge_partitions(partitions)
print(f'merging took: {(dt.datetime.now()-dtstart).total_seconds():.2f} sec')

Using xarray 2023.11.0

Huite · 2024-02-14T08:33:36Z

I guess you can also try loading the datasets with just xr.open_dataset, then turning them into UgridDatasets, then merging them.
That will give the same result, I'm fairly sure. The xugrid.open_mfdataset just adds the data_vars kwarg then calls xr.open_mfdataset.
From the source, parallel=False by default in the mf_opendataset call and that triggers some dask usage or not -- so that shouldn't make a difference here. But maybe there's something more subtle going on as well.

Is there maybe a difference between the eagerness? If you call .compute() in the timing, are the timings the same?

veenstrajelmer · 2024-02-14T09:01:45Z

@Huite, thanks for your suggestions. Indeed, the combination of ds=xr.open_dataset() and xu.UgridDataset(ds) behaves the same. And your second suggestion, to call .compute() significantly increases the performance indeed (0.8 sec). This also triggered me to remember the importance of the chunks argument (chunks={"time":1} is the default in dfm_tools), this also worked as a charm (0.5 sec). So I guess we can conclude there was no additional problem introduced with this PR, and the PR also provides what it aims to do.

veenstrajelmer

Thanks for implementing this. There seemed to bee some issues, but input from @Huite I think we can conclude this PR works like a charm.

docs/changelog.rst

Huite · 2024-02-13T17:13:03Z

docs/changelog.rst


 Changed
 ~~~~~~~

 - :meth:`xugrid.Ugrid2d.from_structured` now takes ``x`` and ``y`` arguments instead
  of ``x_bounds`` and ``y_bounds`` arguments.
+- :func:`xugrid.merge_partitions` allows merging partitions with different grids


Uh, this is added under fixed and changed? Maybe just mention it only for "fixed"

I was a bit in doubt:

I added it to changed, because it is a change of behavior: Prior to this changeset, we test explicitly if all grids occur in every partitions and throw a error if this is not the case. Therefore this changeset doesn't really "fix" a bug, it alters behavior.

On the other hand, the changeset brings xugrid closer to supporting UGRID conventions, so if the premise of xugrid is to support every UGRID file (at least: those without mistakes), you can argue this is a "fix".

For now, I just kept the text under "changed".

xugrid/ugrid/partitioning.py

…the former method return a tuple of names instead

Huite · 2024-02-14T19:19:25Z

Great job!

JoerivanEngelen added 16 commits February 9, 2024 11:50

Start working on issue, deactivate some breaking validation, add some…

65c34b7

… todos

Merge branch 'main' into issue_134_merge_partitions_1D2D

4062710

* Add sizes property

92ab064

* Add ``max_face_node_dimension`` property for Ugrid2D * Add ``max_connectivity_dimensions`` property

Update changelog

9866cb2

* Support merging partitions with different grids per partition

e2bd159

* Add function to maybe pad connectivity max dims * Group data objects and other vars by gridname

Format

ba6e259

Update changelog

151f416

Add test

68b2d94

Support merging partitions with inconsistent grids

8ab89ad

Fix comments and drop mesh1d_nEdges in paritition as well.

30d12b9

Fix validation

5815063

Add test

147f3d4

Add validation if vars in all data objects and ensure all other_vars …

edfa12b

…are added to merged

Add extra tests and adapt some tests

1570561

format

92fd23f

Update changelog

0c85d0b

JoerivanEngelen added the enhancement New feature or request label Feb 13, 2024

JoerivanEngelen requested review from Huite and veenstrajelmer February 13, 2024 16:01

JoerivanEngelen self-assigned this Feb 13, 2024

veenstrajelmer approved these changes Feb 14, 2024

View reviewed changes

Huite requested changes Feb 14, 2024

View reviewed changes

JoerivanEngelen added 5 commits February 14, 2024 11:15

Rename max_connectivity_dimensions to max_connectivity_sizes and let …

22d93a2

…the former method return a tuple of names instead

Remove duplicate message

8c465bc

Add type annotations and add return None

694c31c

format

c925667

type annotate separate_variables and fix comment

0eb60f4

JoerivanEngelen added 6 commits February 14, 2024 13:41

Add type annotation

96f814b

Remove useless filter loop

d9b3494

Type annotate merge_partitions

4b262e4

Simplify logic with reviewer's suggestions

365ee19

format

20181c3

Fix typo in docstring

a96d298

Huite approved these changes Feb 14, 2024

View reviewed changes

Huite merged commit 16b6c9a into main Feb 14, 2024
5 checks passed

Huite deleted the issue_134_merge_partitions_1D2D branch February 14, 2024 19:19

JoerivanEngelen mentioned this pull request Feb 15, 2024

merge_partitions fails for grid with long_culverts #86

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #134 merge partitions with inconsistent grids amongst partitions #216

Issue #134 merge partitions with inconsistent grids amongst partitions #216

JoerivanEngelen commented Feb 13, 2024 •

edited

Loading

veenstrajelmer commented Feb 14, 2024 •

edited

Loading

Huite commented Feb 14, 2024 •

edited

Loading

veenstrajelmer commented Feb 14, 2024 •

edited

Loading

veenstrajelmer left a comment •

edited

Loading

Huite Feb 13, 2024

JoerivanEngelen Feb 14, 2024

Huite commented Feb 14, 2024

Issue #134 merge partitions with inconsistent grids amongst partitions #216

Issue #134 merge partitions with inconsistent grids amongst partitions #216

Conversation

JoerivanEngelen commented Feb 13, 2024 • edited Loading

veenstrajelmer commented Feb 14, 2024 • edited Loading

Huite commented Feb 14, 2024 • edited Loading

veenstrajelmer commented Feb 14, 2024 • edited Loading

veenstrajelmer left a comment • edited Loading

Choose a reason for hiding this comment

Huite Feb 13, 2024

Choose a reason for hiding this comment

JoerivanEngelen Feb 14, 2024

Choose a reason for hiding this comment

Huite commented Feb 14, 2024

JoerivanEngelen commented Feb 13, 2024 •

edited

Loading

veenstrajelmer commented Feb 14, 2024 •

edited

Loading

Huite commented Feb 14, 2024 •

edited

Loading

veenstrajelmer commented Feb 14, 2024 •

edited

Loading

veenstrajelmer left a comment •

edited

Loading