allow for significantly less memory usage in steps blending #435

mats-knmi · 2024-09-24T09:32:51Z

By delaying the decomposition of the model data into cascade levels and allowing for model data to be delivered in float32, it seems to be possible to run the blending with almost 10x less memory than before in some tests I did.

codecov · 2024-09-24T10:54:40Z

Codecov Report

Attention: Patch coverage is 97.82609% with 1 line in your changes missing coverage. Please review.

Project coverage is 83.89%. Comparing base (f211df2) to head (5c14399).

Files with missing lines	Patch %	Lines
pysteps/blending/steps.py	97.82%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #435   +/-   ##
=======================================
  Coverage   83.89%   83.89%           
=======================================
  Files         160      160           
  Lines       12900    12902    +2     
=======================================
+ Hits        10822    10824    +2     
  Misses       2078     2078

Flag	Coverage Δ
unit_tests	`83.89% <97.82%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mats-knmi · 2024-09-24T11:42:02Z

@sidekock I realized yesterday that, with these changes, we can run pysteps a lot cheaper at KNMI. I have tested these changes and it all still seems to run (and fast enough). Maybe you can take these changes along when you start on the refactoring.

RubenImhoff · 2024-09-24T14:28:46Z

Hi @mats-knmi, this looks great, many thanks for the great improvement! Do I understand it well that the code now prefers to work with the NWP rainfall data instead of the decomposed data (whereas this used to be the other way around)? With other words, do we still want to support decomposed NWP data as input? It would be a breaking change, so this needs some discussion, but it would make the code cleaner if it is not necessary anymore.

RubenImhoff · 2024-09-24T14:32:28Z

In addition, I think we should update the function arguments a bit (see below), where we then would switch around the array shape and description (favoring the four dimension without decomposition over the two dimensions with decomposition) and indicating that float32 is supported and even recommended for memory usage.

    precip_models: array-like
      Array of shape (n_models,timesteps+1) containing, per timestep (t=0 to
      lead time here) and per (NWP) model or model ensemble member, a
      dictionary with a list of cascades obtained by calling a method
      implemented in :py:mod:`pysteps.cascade.decomposition`. In case of one
      (deterministic) model as input, add an extra dimension to make sure
      precip_models is five dimensional prior to calling this function.
      It is also possible to supply the original (NWP) model forecasts containing
      rainfall fields as an array of shape (n_models,timestep+1,m,n), which will
      then be decomposed in this function. Note that for an operational application
      or for testing with multiple model runs, it is recommended to decompose
      the model forecasts outside beforehand, as this reduces calculation times.
      This is possible with :py:func:`pysteps.blending.utils.decompose_NWP`,
      :py:func:`pysteps.blending.utils.compute_store_nwp_motion`, and
      :py:func:`pysteps.blending.utils.load_NWP`.

mats-knmi · 2024-09-24T15:27:59Z

Yes indeed, personally I would prefer to work with rainfall data in stead of decomposed data. However this might not be the case as much if you have less NWP model members (like RMI has), since then the decomposed data is not as big as it is for the KNMI usecase and the small extra performance you get by pre-decomposing it might be worth it. So I would be in favor of leaving both options in there, but I will see if I can try to word this nicely in the docstring.

mats-knmi · 2024-09-24T15:45:01Z

@RubenImhoff I have updated the docstring, let me know if it is ok like this

RubenImhoff

Looks great so far! I've added a few comments; after that, I think we are good to go. :)

pysteps/blending/steps.py

Co-authored-by: Ruben Imhoff <[email protected]>

mats-knmi · 2024-09-30T08:38:09Z

@dnerini The unit tests github actions suddenly fail during the conda environment initialization, do you have any idea what is going on here? I didn't change anything that would impact this.

dnerini · 2024-09-30T19:31:58Z

@dnerini The unit tests github actions suddenly fail during the conda environment initialization, do you have any idea what is going on here? I didn't change anything that would impact this.

fixed it temporarily by pinning an older version of micromamba

pysteps/blending/steps.py

RubenImhoff · 2024-10-01T07:40:14Z

With the additions, we may have to add some tests for the coverage. But other than that, I think we're good to go. :) Nice work!

sidekock · 2024-10-03T08:33:24Z

@mats-knmi @RubenImhoff I am planning to look through everything this afternoon.

sidekock · 2024-10-03T11:11:21Z

@mats-knmi, I have some questions before I go more in-depth:

I assume both options (providing raw NWP and decomposed) are still fully supported?
Are the performance increases introduced now also applicable to the old method of decomposing first?
Has there been any analysis as to the impact on final nowcast of the 32bit vs 64bit data?
Is there a decrease in performance (time for a forecast)? I would expect this because now the computations are done during the run...

pysteps/blending/steps.py

mats-knmi · 2024-10-03T12:47:13Z

@sidekock

Yes both options should still work as intended. (The unit tests succeed for both options).
The performance increases are not related to speed, just reduction of memory usage. In general I expect the memory usage to be lower using both options, but specifically when running with non-decomposed model data, as that allows you to input a lot less data (7x less data with 7 cascade levels).
No I haven't done any analysis, but this should not impact a lot as all computations are still done with float64. It is just that you pass in the data as float32, meaning that it takes up less memory. Float32 has around 7 decimal digits of precision so it should be plenty accurate for storage.
Yes running the blending this way has a slight performance decrease (around 5%, or 10 seconds) is what we experienced.

All of this, for us, is more than worth it. We run the blending with 12 hours lead time 5 minute time step and 20 members. This used to mean that we had to read 100+ GB of decomposed model data into memory, before even starting the blending. This is now reduced to around 20 GB.

sidekock

@mats-knmi I did my best to go through all the changes. I added quite a few comments, most of them are just clarifications for me.

pysteps/blending/steps.py

sidekock · 2024-10-03T12:40:15Z

pysteps/blending/steps.py

+                            precip_models_cascade_temp
+                        )
+                        done = True
+                        break


Am I correct that you are looping over all members for each time until you find a member with rain? Are you just doing this change the infs with the minimal value in this field? Is this minimal value in the found field necessarily the minimal value of precipitation in any field?

This was the most difficult part to get right (see also the extensive discussion Ruben and I had on this in this Pull Request). We need a zero value and apparently np.nanmin(precip) or np.nanmin(precip_models) are not correct, since the decomposed precip values are different than the not decomposed values. Therefore we need to decompose the model. To prevent just decomposing the entire model here (which would defeat the purpose of not pre-decomposing the model), we need to find a timestep and member that is representative for the entire model, decompose that and use the nanmin of that for the zero value. Or at least that is the only way I could see that we could realistically do this. I don't know how the decomposition works exactly, but I would expect that an image that has precipitation has a representative decomposition of which we could take the nanmin. This is based on the assumption that at some point somewhere in the domain there is no rain, but this assumption is made quite a bit in other parts of the pysteps code as well.

I would like to add that this code is supposed to solve the issue when precip_cascade is completely filled with nan values, but it doesn't explicitly check if this is the case, which seems weird to me. I also don't know if it is possible to get a precip_cascade value that is completely filled with nans, if you give a sensible value in precip.

To me, this looks like an unfinished problem. I indeed see where this comes from, in an operational product, you should be ready for no radar data, but I feel like this is insufficient because, as far as I understand, this can give a lot of bias (a large precip field will give different min for the different cascades compared to a very intense convection event I would assume, please correct me if I am wrong here @RubenImhoff ). On the other hand, you do not expect the NWP to have huge variations in the 'weather type', so maybe it is fine. The only thing I worry about is that a very intense event with no other rain might give strange artefacts. I assume you used the same code as in "_fill_nans_infs_nwp_cascade" before?
I feel like we at least need to explore a full timestep before taking a min but that might be to computationally intensive (although we could also paralalize this)

sidekock · 2024-10-03T12:42:42Z

pysteps/blending/steps.py

@@ -809,23 +825,80 @@ def forecast(
            if measure_time:
                starttime = time.time()

+            if precip_models_cascade is not None:


I dont think precip_models_cascade is ever initialized, al I wrong in this?

On line 576 it is set to None and then on line 579 it is set to the input precip_models if the ndim is not equal to 4.

So that means if dims is not 4 (the already decomposed case), it has been defined. If it is the other case (raw NWP), it is non, and thus, this decomposed. Is this correct?

pysteps/blending/steps.py

sidekock · 2024-10-03T12:57:38Z

@sidekock

* Yes both options should still work as intended. (The unit tests succeed for both options).

* The performance increases are not related to speed, just reduction of memory usage. In general I expect the memory usage to be lower using both options, but specifically when running with non-decomposed model data, as that allows you to input a lot less data (7x less data with 7 cascade levels).

* No I haven't done any analysis, but this should not impact a lot as all computations are still done with float64. It is just that you pass in the data as float32, meaning that it takes up less memory. Float32 has around 7 decimal digits of precision so it should be plenty accurate for storage.

* Yes running the blending this way has a slight performance decrease (around 5%, or 10 seconds) is what we experienced.

All of this, for us, is more than worth it. We run the blending with 12 hours lead time 5 minute time step and 20 members. This used to mean that we had to read 100+ GB of decomposed model data into memory, before even starting the blending. This is now reduced to around 20 GB.

So you are doing this decomposition on the code now per time step and this is only impacting 10s for an entire forecast (for how long is this nowcast run? eg how many timesteps?) or 10s per step?

allow for significantly less memory usage in steps blending

7b52ab5

mats-knmi requested a review from RubenImhoff September 24, 2024 09:32

mats-knmi added 4 commits September 24, 2024 11:38

apply black

34d90ba

use threadpool

dabd93a

only decompose if not already done

4e2758a

fix black

1a4ab4b

mats-knmi requested a review from sidekock September 24, 2024 15:29

mats-knmi added 2 commits September 24, 2024 17:40

update docstring

a468411

fix one more docstring and rename some variables

a5c9b22

RubenImhoff requested changes Sep 26, 2024

View reviewed changes

mats-knmi and others added 3 commits September 26, 2024 16:54

Apply suggestions from code review

6dc7efe

Co-authored-by: Ruben Imhoff <[email protected]>

rename precip_models_pm

af0cd15

move comment about memory reduction to velocity_models

17c00c7

Merge branch 'master' into memory-efficiency-blending

03bcfc6

mats-knmi added 2 commits October 1, 2024 08:10

fix zerovalue precip_cascade

3d1366b

use precip_models_cascade if present

c5718cb

RubenImhoff reviewed Oct 1, 2024

View reviewed changes

pysteps/blending/steps.py Show resolved Hide resolved

mats-knmi added 2 commits October 1, 2024 09:33

set done to true

f5e5697

fix black

e478bc0

mats-knmi added 2 commits October 2, 2024 17:29

revert to old logic, but now with precip_models_cascade using new loop

7fd7122

fix issue with precip_models_cascade

6d51fd4

fix tests

0198a54

RubenImhoff approved these changes Oct 3, 2024

View reviewed changes

sidekock reviewed Oct 3, 2024

View reviewed changes

pysteps/blending/steps.py Show resolved Hide resolved

sidekock reviewed Oct 3, 2024

View reviewed changes

Move comment up a little

5c14399

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow for significantly less memory usage in steps blending #435

allow for significantly less memory usage in steps blending #435

mats-knmi commented Sep 24, 2024 •

edited

Loading

codecov bot commented Sep 24, 2024 •

edited

Loading

mats-knmi commented Sep 24, 2024

RubenImhoff commented Sep 24, 2024

RubenImhoff commented Sep 24, 2024

mats-knmi commented Sep 24, 2024

mats-knmi commented Sep 24, 2024

RubenImhoff left a comment

mats-knmi commented Sep 30, 2024

dnerini commented Sep 30, 2024

RubenImhoff commented Oct 1, 2024

sidekock commented Oct 3, 2024

sidekock commented Oct 3, 2024 •

edited

Loading

mats-knmi commented Oct 3, 2024

sidekock left a comment

sidekock Oct 3, 2024

mats-knmi Oct 3, 2024

sidekock Oct 3, 2024

sidekock Oct 3, 2024

mats-knmi Oct 3, 2024

sidekock Oct 3, 2024

sidekock commented Oct 3, 2024

allow for significantly less memory usage in steps blending #435

Are you sure you want to change the base?

allow for significantly less memory usage in steps blending #435

Conversation

mats-knmi commented Sep 24, 2024 • edited Loading

codecov bot commented Sep 24, 2024 • edited Loading

Codecov Report

mats-knmi commented Sep 24, 2024

RubenImhoff commented Sep 24, 2024

RubenImhoff commented Sep 24, 2024

mats-knmi commented Sep 24, 2024

mats-knmi commented Sep 24, 2024

RubenImhoff left a comment

Choose a reason for hiding this comment

mats-knmi commented Sep 30, 2024

dnerini commented Sep 30, 2024

RubenImhoff commented Oct 1, 2024

sidekock commented Oct 3, 2024

sidekock commented Oct 3, 2024 • edited Loading

mats-knmi commented Oct 3, 2024

sidekock left a comment

Choose a reason for hiding this comment

sidekock Oct 3, 2024

Choose a reason for hiding this comment

mats-knmi Oct 3, 2024

Choose a reason for hiding this comment

sidekock Oct 3, 2024

Choose a reason for hiding this comment

sidekock Oct 3, 2024

Choose a reason for hiding this comment

mats-knmi Oct 3, 2024

Choose a reason for hiding this comment

sidekock Oct 3, 2024

Choose a reason for hiding this comment

sidekock commented Oct 3, 2024

mats-knmi commented Sep 24, 2024 •

edited

Loading

codecov bot commented Sep 24, 2024 •

edited

Loading

sidekock commented Oct 3, 2024 •

edited

Loading