You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that when accessing this via ds.variables.items() instead of ds[var], the dataarray is not accessed each time which saves a lot of time in case of many variables. The original method profiles like this:
When replacing the _get_topology() code with [k for k, var in ds.variables.items() if var.attrs.get("cf_role") == "mesh_topology"] or [k for k in ds.data_vars if ds.variables[k].attrs.get("cf_role") == "mesh_topology"] (so adding only .variables), the profiler looks like this:
So the timings drop from 16 seconds to <1 second in an example with 5 partitions. This will cause a tremendous improvement when using all 256 partitions of the dataset. Do note that this case covers a dataset with 2410 variables, so it will mostly improve performance of datasets with many variables. Some code to reproduce:
_get_topology
loops over all data_vars:xugrid/xugrid/ugrid/conventions.py
Lines 183 to 184 in 3dee693
It seems that when accessing this via
ds.variables.items()
instead ofds[var]
, the dataarray is not accessed each time which saves a lot of time in case of many variables. The original method profiles like this:When replacing the
_get_topology()
code with[k for k, var in ds.variables.items() if var.attrs.get("cf_role") == "mesh_topology"]
or[k for k in ds.data_vars if ds.variables[k].attrs.get("cf_role") == "mesh_topology"]
(so adding only.variables
), the profiler looks like this:So the timings drop from 16 seconds to <1 second in an example with 5 partitions. This will cause a tremendous improvement when using all 256 partitions of the dataset. Do note that this case covers a dataset with 2410 variables, so it will mostly improve performance of datasets with many variables. Some code to reproduce:
The text was updated successfully, but these errors were encountered: