Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix allocations caused due to new contravariant_vectors #12

Open
wants to merge 2 commits into
base: spherical_shell
Choose a base branch
from

Conversation

amrueda
Copy link
Owner

@amrueda amrueda commented Dec 5, 2023

This PR creates a new array type ArrayExtended{T, N, NDIMS_SPA} that extends Array{T, N} for the array contravariant_vectors. The additional attribute NDIMS_SPA stores the number of spatial dimensions of the problem as a compile-time constant, such that it can be accessed in a type-stable way from the function get_contravariant_vectors().

We forward ArrayExtended into the routines that take Array using the macro Lazy.@forward. This is done (unfortunately) manually for all routines that take contravariant_vectors as an argument.

…iome constant in contravariant_vectors

- This reduces the allocations in the interface routines, but causes some allocations in other places where contravariant_vectors is used
- TODO: Is there a way to improve this?
Copy link

github-actions bot commented Dec 5, 2023

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

@amrueda
Copy link
Owner Author

amrueda commented Dec 5, 2023

Note that there are several implementations of forwarding macros. See, e.g., this discussion. They all seem to need a manual specification of the functions that will be forwarded. We could either keep using Lazy.@forward, use another implementation, or just implement our own version of the macro.

@amrueda
Copy link
Owner Author

amrueda commented Dec 5, 2023

Performance comparison: We run the example https://github.com/trixi-framework/Trixi.jl/blob/main/examples/p4est_3d_dgsem/elixir_advection_cubed_sphere.jl with three different implementations.

────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              12.4s / 100.0%           43.2GiB / 100.0%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                         206    10.4s   84.2%  50.5ms   41.3GiB   95.6%   205MiB
   volume integral            206    9.83s   79.6%  47.7ms   39.5GiB   91.5%   196MiB
   interface flux             206    370ms    3.0%  1.80ms   1.58GiB    3.6%  7.83MiB
   boundary flux              206    156ms    1.3%   756μs    227MiB    0.5%  1.10MiB
   prolong2interfaces         206   10.5ms    0.1%  51.1μs    653KiB    0.0%  3.17KiB
   surface integral           206   6.51ms    0.1%  31.6μs    656KiB    0.0%  3.19KiB
   Jacobian                   206   6.15ms    0.0%  29.9μs    683KiB    0.0%  3.31KiB
   reset ∂u/∂t                206   4.09ms    0.0%  19.9μs    476KiB    0.0%  2.31KiB
   prolong2boundaries         206   3.29ms    0.0%  16.0μs    646KiB    0.0%  3.13KiB
   prolong2mortars            206   2.01ms    0.0%  9.78μs    731KiB    0.0%  3.55KiB
   mortar flux                206   1.77ms    0.0%  8.62μs    756KiB    0.0%  3.67KiB
   ~rhs!~                     206    616μs    0.0%  2.99μs   9.33KiB    0.0%    46.4B
   source terms               206   3.96μs    0.0%  19.2ns     0.00B    0.0%    0.00B
 calculate dt                  42    1.79s   14.5%  42.6ms   1.48GiB    3.4%  36.0MiB
 analyze solution               2    164ms    1.3%  81.8ms    411MiB    0.9%   206MiB
 I/O                            3   5.52ms    0.0%  1.84ms   1.46MiB    0.0%   499KiB
   ~I/O~                        3   4.13ms    0.0%  1.38ms    136KiB    0.0%  45.3KiB
   save solution                2   1.38ms    0.0%   692μs   1.33MiB    0.0%   680KiB
   get element variables        2    708ns    0.0%   354ns     0.00B    0.0%    0.00B
   save mesh                    2    166ns    0.0%  83.0ns     0.00B    0.0%    0.00B
   get node variables           2   83.0ns    0.0%  41.5ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────
  • The version in this PR reduces the allocations considerably:
────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              100ms /  92.8%           8.83MiB /  97.4%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                         206   74.8ms   80.3%   363μs   6.56MiB   76.3%  32.6KiB
   volume integral            206   23.8ms   25.5%   115μs    715KiB    8.1%  3.47KiB
   interface flux             206   12.7ms   13.6%  61.6μs    714KiB    8.1%  3.46KiB
   prolong2interfaces         206   7.41ms    7.9%  36.0μs    652KiB    7.4%  3.17KiB
   boundary flux              206   7.33ms    7.9%  35.6μs    676KiB    7.7%  3.28KiB
   Jacobian                   206   5.83ms    6.3%  28.3μs    683KiB    7.8%  3.32KiB
   surface integral           206   5.82ms    6.2%  28.2μs    657KiB    7.5%  3.19KiB
   reset ∂u/∂t                206   5.79ms    6.2%  28.1μs    476KiB    5.4%  2.31KiB
   prolong2boundaries         206   2.44ms    2.6%  11.9μs    645KiB    7.3%  3.13KiB
   prolong2mortars            206   1.78ms    1.9%  8.63μs    734KiB    8.3%  3.56KiB
   mortar flux                206   1.76ms    1.9%  8.55μs    760KiB    8.6%  3.69KiB
   ~rhs!~                     206    206μs    0.2%   999ns   9.33KiB    0.1%    46.4B
   source terms               206   2.88μs    0.0%  14.0ns     0.00B    0.0%    0.00B
 analyze solution               2   7.69ms    8.3%  3.85ms    596KiB    6.8%   298KiB
 I/O                            3   6.05ms    6.5%  2.02ms   1.46MiB   17.0%   499KiB
   ~I/O~                        3   4.59ms    4.9%  1.53ms    136KiB    1.5%  45.3KiB
   save solution                2   1.46ms    1.6%   730μs   1.33MiB   15.4%   680KiB
   get element variables        2    542ns    0.0%   271ns     0.00B    0.0%    0.00B
   save mesh                    2    125ns    0.0%  62.5ns     0.00B    0.0%    0.00B
   get node variables           2   0.00ns    0.0%  0.00ns     0.00B    0.0%    0.00B
 calculate dt                  42   4.62ms    5.0%   110μs     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────
  • Note that this is comparable to the allocations obtained with the main branch:
────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              115ms /  86.7%           8.83MiB /  97.4%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                         206   79.9ms   80.3%   388μs   6.56MiB   76.3%  32.6KiB
   volume integral            206   25.3ms   25.4%   123μs    717KiB    8.1%  3.48KiB
   interface flux             206   13.8ms   13.9%  67.1μs    715KiB    8.1%  3.47KiB
   prolong2interfaces         206   8.00ms    8.0%  38.8μs    651KiB    7.4%  3.16KiB
   boundary flux              206   7.89ms    7.9%  38.3μs    676KiB    7.7%  3.28KiB
   surface integral           206   6.20ms    6.2%  30.1μs    656KiB    7.4%  3.19KiB
   Jacobian                   206   5.87ms    5.9%  28.5μs    682KiB    7.7%  3.31KiB
   reset ∂u/∂t                206   4.62ms    4.6%  22.4μs    476KiB    5.4%  2.31KiB
   prolong2boundaries         206   3.15ms    3.2%  15.3μs    644KiB    7.3%  3.13KiB
   prolong2mortars            206   2.42ms    2.4%  11.8μs    734KiB    8.3%  3.56KiB
   mortar flux                206   2.42ms    2.4%  11.7μs    759KiB    8.6%  3.69KiB
   ~rhs!~                     206    165μs    0.2%   800ns   9.33KiB    0.1%    46.4B
   source terms               206   2.29μs    0.0%  11.1ns     0.00B    0.0%    0.00B
 analyze solution               2   8.43ms    8.5%  4.21ms    596KiB    6.8%   298KiB
 I/O                            3   6.97ms    7.0%  2.32ms   1.46MiB   17.0%   499KiB
   ~I/O~                        3   4.99ms    5.0%  1.66ms    136KiB    1.5%  45.3KiB
   save solution                2   1.98ms    2.0%   989μs   1.33MiB   15.4%   680KiB
   get element variables        2    499ns    0.0%   250ns     0.00B    0.0%    0.00B
   save mesh                    2    249ns    0.0%   124ns     0.00B    0.0%    0.00B
   get node variables           2   0.00ns    0.0%  0.00ns     0.00B    0.0%    0.00B
 calculate dt                  42   4.25ms    4.3%   101μs     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

@ranocha
Copy link

ranocha commented Dec 5, 2023

Why don't you use PtrArray instead like we do in Trixi.wrap_array?

@amrueda
Copy link
Owner Author

amrueda commented Dec 7, 2023

Why don't you use PtrArray instead like we do in Trixi.wrap_array?

I tried using a PtrArray for contravariant_vectors in this PR. However, for some reason, that does not get rid of the allocations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants