-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move computation of wet_fac to post_init #22
Conversation
This is great, thanks for the update! Two questions.
|
Thanks for looking at this!
I just tested this. Yes, it works on GPU.
The answer is: I just assumed it would be faster. Turns out it takes pretty much the same time, see this notebook. The good news is that gcm_filters is much faster on GPU than on CPU (as we saw before). Since this new implementation is not faster, we can just disregard this PR. Sorry for the noise! |
Having a closer look at the code in filter.py, I now understand why this PR doesn't lead to a large speedup. In my example, I am filtering 1600 2D fields (100 time slices, 16 layers). For each of the 1600 fields, the Laplacian is applied 8 times (since I chose gcm-filters/gcm_filters/filter.py Lines 245 to 253 in 73baa9e
In summary, while I expected to reduce 1600 * 8 computations to 1 computation, my PR led only to a reduction from 1600 * 8 to 1600 * 1 computations. Since you typically do not choose very high values for |
Thanks for digging into this. It is extremely useful information. Did you run profiling to get it? If so, can you post the profile report here? I think we should keep this PR and think of a way to refactor the kernel so that the initialization only has to be done once. (That can be separate PR and should probably be done by me.) |
I haven't done any rigorous profiling, but saw no significant difference in the execution time. That, together with looking at the code structure, led me to this conclusion. But I could be wrong. I think I will prioritize the implementation of the Laplacian kernels for some of the gcm grids in the near future, but could get back to this performance question at a later time. |
I think you are right. The kernel is initialized inside the function passed to apply_ufunc: gcm-filters/gcm_filters/filter.py Lines 149 to 156 in 8b92035
We should be able to move the initialization ( |
This PR does the following things:
In
CartesianLaplacianWithLandMask
fac
(now:wet_fac
) to pre_init. This factor is independent of the field which the Laplacian is applied to. If we apply the Laplacian many times (as we will for the filtering operation), computing the factor only once will speed things up.out = field.copy()
Following @rabernat's recommendation in #21 (comment), I also disabled the mypy check.