Improve reading model output #194

dbrakenhoff · 2023-06-21T12:14:44Z

Use dask.delayed to load model output. This avoids reading all data into memory if delayed=True. Optionally chunk data array for doing memory efficient calculations on large data arrays.

Adds two kwargs to output methods:

delayed, if True, do not load data into memory, default is False
chunked, if True, chunk data array using da.chunk("auto"), default is False

Default behavior is same as before, for memory intensive output, use delayed=True, and optionally chunked=True, e.g.:

heads_orig = nlmod.gwf.output.get_heads_da(ds)  # read all data into memory
heads_delayed = nlmod.gwf.output.get_heads_da(ds, delayed=True)  # memory efficient
heads_chunked = nlmod.gwf.output.get_heads_da(ds, delayed=True, chunked=True)  # chunked

- add kwarg delayed, if False, load data into memory, else return data array with delayed dask arrays - add kwarg chunked, if True, chunk data array with chunks="auto" - add x,y data to vertex grid data array

OnnoEbbens

Good job!

I was thinking if we should make this functions cache-able using the cache_netcdf decorator but I think it is too hard (and to little gains) because the cache_netcdf requires a dataset as input and for this function it is optional. However if these functions do become annoyingly slow we could evaluate again because it is certainly possible to use the cache here.

nlmod/gwf/output.py

nlmod/mfoutput.py

- fix comments - add docstrings

- fix where command for dry/noflow - improve support for loading output without gwf or ds

- new folder mfoutput - add new flopy binary read functions that support multithreading (binaryfile.py) - separate reading budget and head output - modify gwf.output and gwt.output to use new methods - split logic into multiple reusable functions - add support for grb files - add/improve tests - add method to obtain dims, coords from modelgrid object

dbrakenhoff · 2023-07-19T16:09:35Z

Alright, the idea is still the same but I refactored the code significantly to increase readability and simplify things.

The idea now is that the binary output file (HeadFile and CellBudgetFile) and the modelgrid should contain enough information to construct a DataArray. If you pass in only a filename to e.g. get_heads_da(fname=fname) you will receive a warning that the grid information is missing. This information can be provided by passing grbfile=<path to binary grid file> keyword argument. You can still load data, but the grid will be some default grid definition (and will not contain the correct spatial coordinates).

The file mfoutput/mfoutput.py contains general logic for converting data from flopy binary file objects (HeadFile and CellBudgetFile) to data arrays. I defined a bunch of helper functions to reduce duplication and keep functions short.
The file mfoutput/binaryfile.py contains code to read data from binary output files but supporting multithreading. Code is copied from flopy, but modified to contain only the necessary code. We do not support the same level of data accessing options as flopy.
gwt/output.py has been modified to use these new general functions for concentration data
gwf/output.py has been modified to use these new general functions for head and budget data

nlmod/gwf/output.py

nlmod/dims/grid.py

nlmod/plot/flopy.py

OnnoEbbens

see my comments

@OnnoEbbens

- codacy - fix comments @OnnoEbbens - some additional fixes

dbrakenhoff added 2 commits June 21, 2023 14:05

Use dask delayed to read model output

5376c19

- add kwarg delayed, if False, load data into memory, else return data array with delayed dask arrays - add kwarg chunked, if True, chunk data array with chunks="auto" - add x,y data to vertex grid data array

add delayed, chunked kwargs to output methods

2179557

dbrakenhoff requested a review from OnnoEbbens July 17, 2023 10:21

OnnoEbbens approved these changes Jul 17, 2023

View reviewed changes

nlmod/gwf/output.py Outdated Show resolved Hide resolved

nlmod/gwf/output.py Outdated Show resolved Hide resolved

nlmod/gwf/output.py Show resolved Hide resolved

nlmod/mfoutput.py Outdated Show resolved Hide resolved

nlmod/mfoutput.py Outdated Show resolved Hide resolved

dbrakenhoff added 3 commits July 17, 2023 15:09

address @OnnoEbbens comments

09cebd9

- fix comments - add docstrings

improve mfoutput

83bc51d

- fix where command for dry/noflow - improve support for loading output without gwf or ds

dbrakenhoff requested a review from OnnoEbbens July 19, 2023 16:00

dbrakenhoff added 2 commits July 19, 2023 18:11

remove deprecation from docstring

f8d58e3

add test data and modify gitignore

17b121c

OnnoEbbens approved these changes Jul 20, 2023

View reviewed changes

nlmod/gwf/output.py Outdated Show resolved Hide resolved

OnnoEbbens reviewed Jul 20, 2023

View reviewed changes

nlmod/dims/grid.py Show resolved Hide resolved

OnnoEbbens reviewed Jul 20, 2023

View reviewed changes

nlmod/plot/flopy.py Show resolved Hide resolved

OnnoEbbens requested changes Jul 20, 2023

View reviewed changes

update PR

09c5a7d

- codacy - fix comments @OnnoEbbens - some additional fixes

dbrakenhoff requested a review from OnnoEbbens July 20, 2023 10:39

OnnoEbbens approved these changes Jul 20, 2023

View reviewed changes

dbrakenhoff merged commit 945e302 into dev Jul 20, 2023
1 of 2 checks passed

dbrakenhoff deleted the mfoutput branch July 20, 2023 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve reading model output #194

Improve reading model output #194

dbrakenhoff commented Jun 21, 2023 •

edited

Loading

OnnoEbbens left a comment

dbrakenhoff commented Jul 19, 2023

OnnoEbbens left a comment

Improve reading model output #194

Improve reading model output #194

Conversation

dbrakenhoff commented Jun 21, 2023 • edited Loading

OnnoEbbens left a comment

Choose a reason for hiding this comment

dbrakenhoff commented Jul 19, 2023

OnnoEbbens left a comment

Choose a reason for hiding this comment

dbrakenhoff commented Jun 21, 2023 •

edited

Loading