-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix allocations caused due to new contravariant_vectors #12
base: spherical_shell
Are you sure you want to change the base?
Conversation
…iome constant in contravariant_vectors - This reduces the allocations in the interface routines, but causes some allocations in other places where contravariant_vectors is used - TODO: Is there a way to improve this?
Review checklistThis checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging. Purpose and scope
Code quality
Documentation
Testing
Performance
Verification
Created with ❤️ by the Trixi.jl community. |
Note that there are several implementations of forwarding macros. See, e.g., this discussion. They all seem to need a manual specification of the functions that will be forwarded. We could either keep using Lazy. |
Performance comparison: We run the example https://github.com/trixi-framework/Trixi.jl/blob/main/examples/p4est_3d_dgsem/elixir_advection_cubed_sphere.jl with three different implementations.
────────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 12.4s / 100.0% 43.2GiB / 100.0%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────
rhs! 206 10.4s 84.2% 50.5ms 41.3GiB 95.6% 205MiB
volume integral 206 9.83s 79.6% 47.7ms 39.5GiB 91.5% 196MiB
interface flux 206 370ms 3.0% 1.80ms 1.58GiB 3.6% 7.83MiB
boundary flux 206 156ms 1.3% 756μs 227MiB 0.5% 1.10MiB
prolong2interfaces 206 10.5ms 0.1% 51.1μs 653KiB 0.0% 3.17KiB
surface integral 206 6.51ms 0.1% 31.6μs 656KiB 0.0% 3.19KiB
Jacobian 206 6.15ms 0.0% 29.9μs 683KiB 0.0% 3.31KiB
reset ∂u/∂t 206 4.09ms 0.0% 19.9μs 476KiB 0.0% 2.31KiB
prolong2boundaries 206 3.29ms 0.0% 16.0μs 646KiB 0.0% 3.13KiB
prolong2mortars 206 2.01ms 0.0% 9.78μs 731KiB 0.0% 3.55KiB
mortar flux 206 1.77ms 0.0% 8.62μs 756KiB 0.0% 3.67KiB
~rhs!~ 206 616μs 0.0% 2.99μs 9.33KiB 0.0% 46.4B
source terms 206 3.96μs 0.0% 19.2ns 0.00B 0.0% 0.00B
calculate dt 42 1.79s 14.5% 42.6ms 1.48GiB 3.4% 36.0MiB
analyze solution 2 164ms 1.3% 81.8ms 411MiB 0.9% 206MiB
I/O 3 5.52ms 0.0% 1.84ms 1.46MiB 0.0% 499KiB
~I/O~ 3 4.13ms 0.0% 1.38ms 136KiB 0.0% 45.3KiB
save solution 2 1.38ms 0.0% 692μs 1.33MiB 0.0% 680KiB
get element variables 2 708ns 0.0% 354ns 0.00B 0.0% 0.00B
save mesh 2 166ns 0.0% 83.0ns 0.00B 0.0% 0.00B
get node variables 2 83.0ns 0.0% 41.5ns 0.00B 0.0% 0.00B
────────────────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 100ms / 92.8% 8.83MiB / 97.4%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────
rhs! 206 74.8ms 80.3% 363μs 6.56MiB 76.3% 32.6KiB
volume integral 206 23.8ms 25.5% 115μs 715KiB 8.1% 3.47KiB
interface flux 206 12.7ms 13.6% 61.6μs 714KiB 8.1% 3.46KiB
prolong2interfaces 206 7.41ms 7.9% 36.0μs 652KiB 7.4% 3.17KiB
boundary flux 206 7.33ms 7.9% 35.6μs 676KiB 7.7% 3.28KiB
Jacobian 206 5.83ms 6.3% 28.3μs 683KiB 7.8% 3.32KiB
surface integral 206 5.82ms 6.2% 28.2μs 657KiB 7.5% 3.19KiB
reset ∂u/∂t 206 5.79ms 6.2% 28.1μs 476KiB 5.4% 2.31KiB
prolong2boundaries 206 2.44ms 2.6% 11.9μs 645KiB 7.3% 3.13KiB
prolong2mortars 206 1.78ms 1.9% 8.63μs 734KiB 8.3% 3.56KiB
mortar flux 206 1.76ms 1.9% 8.55μs 760KiB 8.6% 3.69KiB
~rhs!~ 206 206μs 0.2% 999ns 9.33KiB 0.1% 46.4B
source terms 206 2.88μs 0.0% 14.0ns 0.00B 0.0% 0.00B
analyze solution 2 7.69ms 8.3% 3.85ms 596KiB 6.8% 298KiB
I/O 3 6.05ms 6.5% 2.02ms 1.46MiB 17.0% 499KiB
~I/O~ 3 4.59ms 4.9% 1.53ms 136KiB 1.5% 45.3KiB
save solution 2 1.46ms 1.6% 730μs 1.33MiB 15.4% 680KiB
get element variables 2 542ns 0.0% 271ns 0.00B 0.0% 0.00B
save mesh 2 125ns 0.0% 62.5ns 0.00B 0.0% 0.00B
get node variables 2 0.00ns 0.0% 0.00ns 0.00B 0.0% 0.00B
calculate dt 42 4.62ms 5.0% 110μs 0.00B 0.0% 0.00B
────────────────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 115ms / 86.7% 8.83MiB / 97.4%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────
rhs! 206 79.9ms 80.3% 388μs 6.56MiB 76.3% 32.6KiB
volume integral 206 25.3ms 25.4% 123μs 717KiB 8.1% 3.48KiB
interface flux 206 13.8ms 13.9% 67.1μs 715KiB 8.1% 3.47KiB
prolong2interfaces 206 8.00ms 8.0% 38.8μs 651KiB 7.4% 3.16KiB
boundary flux 206 7.89ms 7.9% 38.3μs 676KiB 7.7% 3.28KiB
surface integral 206 6.20ms 6.2% 30.1μs 656KiB 7.4% 3.19KiB
Jacobian 206 5.87ms 5.9% 28.5μs 682KiB 7.7% 3.31KiB
reset ∂u/∂t 206 4.62ms 4.6% 22.4μs 476KiB 5.4% 2.31KiB
prolong2boundaries 206 3.15ms 3.2% 15.3μs 644KiB 7.3% 3.13KiB
prolong2mortars 206 2.42ms 2.4% 11.8μs 734KiB 8.3% 3.56KiB
mortar flux 206 2.42ms 2.4% 11.7μs 759KiB 8.6% 3.69KiB
~rhs!~ 206 165μs 0.2% 800ns 9.33KiB 0.1% 46.4B
source terms 206 2.29μs 0.0% 11.1ns 0.00B 0.0% 0.00B
analyze solution 2 8.43ms 8.5% 4.21ms 596KiB 6.8% 298KiB
I/O 3 6.97ms 7.0% 2.32ms 1.46MiB 17.0% 499KiB
~I/O~ 3 4.99ms 5.0% 1.66ms 136KiB 1.5% 45.3KiB
save solution 2 1.98ms 2.0% 989μs 1.33MiB 15.4% 680KiB
get element variables 2 499ns 0.0% 250ns 0.00B 0.0% 0.00B
save mesh 2 249ns 0.0% 124ns 0.00B 0.0% 0.00B
get node variables 2 0.00ns 0.0% 0.00ns 0.00B 0.0% 0.00B
calculate dt 42 4.25ms 4.3% 101μs 0.00B 0.0% 0.00B
──────────────────────────────────────────────────────────────────────────────────── |
Why don't you use |
I tried using a |
This PR creates a new array type
ArrayExtended{T, N, NDIMS_SPA}
that extendsArray{T, N}
for the arraycontravariant_vectors
. The additional attributeNDIMS_SPA
stores the number of spatial dimensions of the problem as a compile-time constant, such that it can be accessed in a type-stable way from the functionget_contravariant_vectors()
.We forward
ArrayExtended
into the routines that takeArray
using the macro Lazy.@forward
. This is done (unfortunately) manually for all routines that takecontravariant_vectors
as an argument.