Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDBetteridge/update caching #3730

Merged
merged 21 commits into from
Oct 9, 2024
Merged

JDBetteridge/update caching #3730

merged 21 commits into from
Oct 9, 2024

Conversation

JDBetteridge
Copy link
Member

@JDBetteridge JDBetteridge commented Aug 12, 2024

Description

Necessary caches replaced with parallel safe ones.

Depends on PyOP2 #724 going in first.

The PyOP2 caches are used in 3 key places:

  • tsfc_interface.py: TSFC no longer inherits from the (removed) Cached object and the tsfc_compile_form function is now a cached function.
  • slate/slac/compiler.py: SlateKernel still inherits from TSFCKernel, the compile_expression function is now cached
  • interpolation.py: The compile_expression function is cached with the new decorator.

The tests have been updated to run with the new PYOP2_SPMD_STRICT environment variable set to 1. Now everything passes without deadlocking, with the exception of two tests utilising ngsPETSc, which requires an upstream fix.

Testing performance has been significantly improved, but this isn't apparent on CI (I'm still not 100% sure why). Results are as follows:

Commands

Run on my workstation:

firedrake-clean; yes | pyop2-clean
time env PYOP2_CACHE_INFO=1 mprof run --include-children pytest -n 16 --dist=worksteal --durations 20 -v tests; mprof peak

Current Firedrake master

(export REV="Old")
cold
Output:

=== slowest 20 durations ====================================================================================
141.19s call     tests/regression/test_fdm.py::test_p_independence_hcurl[Box]
116.33s call     tests/external_operators/test_external_operators_adjoint.py::test_translation_operator_inverse_problem
109.21s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity[sparse]
104.67s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity_parallel[per_cell]
101.03s call     tests/regression/test_fdm.py::test_p_independence_hdiv[Box]
95.72s call     tests/demos/test_notebooks_run.py::test_notebook_runs[10-sum-factorisation.ipynb]
80.05s call     tests/slate/test_slate_hybridization_extr.py::test_hybrid_extr_helmholtz[True]
77.77s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity[dense]
74.48s call     tests/demos/test_notebooks_run.py::test_notebook_runs[09-hybridisation.ipynb]
72.14s call     tests/regression/test_stress_els.py::test_stress_displacement_convergence[conforming]
70.72s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity[per_cell]
68.52s call     tests/demos/test_notebooks_run.py::test_notebook_runs[12-HPC_demo.ipynb]
63.96s call     tests/regression/test_interpolate_cross_mesh.py::test_interpolate_cross_mesh[spheresphere]
57.30s call     tests/regression/test_fdm.py::test_ipdg_direct_solver[rt-Box]
53.57s call     tests/regression/test_linesmoother.py::test_linesmoother[petscasm-Quad]
50.84s call     tests/regression/test_fdm.py::test_p_independence_hgrad[Box-fdm]
49.87s call     tests/multigrid/test_hiptmair.py::test_hiptmair_hcurl[hexahedron-aij]
49.66s call     tests/regression/test_fdm.py::test_p_independence_hcurl[Rectangle]
47.91s call     tests/slate/test_slate_hybridization_extr.py::test_hybrid_extr_helmholtz[False]
47.69s call     tests/slate/test_hdg_poisson.py::test_hdg_convergence[1-True-1.75]

real    20m33.434s
user    481m20.481s
sys     49m7.885s

mprofile_20240823224837.dat     43321.184 MiB

Old_cold

This branch

(export REV="Experiment")
cold
Output:

=== slowest 20 durations ====================================================================================
114.55s call     tests/external_operators/test_external_operators_adjoint.py::test_translation_operator_inverse_problem
112.40s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity[sparse]
110.55s call     tests/regression/test_fdm.py::test_p_independence_hcurl[Box]
87.76s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity_parallel[per_cell]
85.13s call     tests/demos/test_notebooks_run.py::test_notebook_runs[10-sum-factorisation.ipynb]
78.62s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity[dense]
75.13s call     tests/regression/test_fdm.py::test_p_independence_hdiv[Box]
71.45s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity[per_cell]
68.81s call     tests/demos/test_notebooks_run.py::test_notebook_runs[12-HPC_demo.ipynb]
64.85s call     tests/equation_bcs/test_equation_bcs.py::test_EquationBC_mixedpoisson_matfree_fieldsplit
64.35s call     tests/regression/test_stress_els.py::test_stress_displacement_convergence[conforming]
64.03s call     tests/slate/test_slate_hybridization_extr.py::test_hybrid_extr_helmholtz[True]
62.70s call     tests/regression/test_interpolate_cross_mesh.py::test_interpolate_cross_mesh[spheresphere]
55.62s call     tests/equation_bcs/test_equation_bcs.py::test_EquationBC_poisson_matfree[True]
54.31s call     tests/demos/test_notebooks_run.py::test_notebook_runs[09-hybridisation.ipynb]
53.12s call     tests/regression/test_fdm.py::test_ipdg_direct_solver[rt-Box]
43.07s call     tests/output/test_io_function.py::test_io_function_base[cell_family_degree21]
39.91s call     tests/regression/test_linesmoother.py::test_linesmoother[petscasm-Quad]
39.38s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity_parallel[dense]
38.17s call     tests/regression/test_interpolate_cross_mesh.py::test_interpolate_cross_mesh_parallel[spheresphere]

real    15m9.087s
user    330m19.724s
sys     38m52.005s

mprofile_20240823234054.dat     31183.383 MiB

Experiment_cold

I also ran with a warm cache with similar levels of improvement, but this isn't as relevant (to CI).

Copy link
Member

@dham dham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a couple of comments but I think I need talking through this.

firedrake/slate/slac/compiler.py Outdated Show resolved Hide resolved
# Drop prefix as it's only used for naming and log
# JBTODO: Can't drop prefix as tests/slate/test_optimise.py::test_partially_optimised fails, investigate
# it looks like the prefix is being used to create different subkernels, which conflicts with the docstring below
return default_parallel_hashkey(form.signature(), prefix, parameters, interface, diagonal)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand the change. Why are we allowed to drop coefficient_numbers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coefficient_numbers is actually derived directly from form so it doesn't need to be included. Also this cache now sits at a slightly different point in the stack so coefficient_numbers isn't even an argument to the wrapped function.

Copy link

github-actions bot commented Oct 2, 2024

TestsPassed ✅Skipped ⏭️Failed ❌
Firedrake complex8050 ran6468 passed1582 skipped0 failed

Copy link

github-actions bot commented Oct 2, 2024

TestsPassed ✅Skipped ⏭️Failed ❌
Firedrake real8056 ran7270 passed786 skipped0 failed

dham
dham previously approved these changes Oct 3, 2024
@JDBetteridge
Copy link
Member Author

I think this one is ready to go (once tests pass of course!)

@dham dham merged commit 4e6cba3 into master Oct 9, 2024
7 of 8 checks passed
@dham dham deleted the JDBetteridge/update_caching branch October 9, 2024 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants