Test suite clean up #3385

JDBetteridge · 2024-02-02T16:52:43Z

Description

This PR started as an experiment to "cheaply" speed up the test suite by calling mpiexec wrapping pytest, rather than forking a subprocess which calls mpiexec (which is also problematic for other reasons).

This PR now carries around multiple test suite fixes that should be merged back to master and includes fixes including:

Adding comm arguments to function calls that need them.
Freeing comms that are created.
Disabling a test that pollutes the tape.
"Fixing" ensemble parallel tests by using the simple partitioner (just in tests, Ensemble needs a proper fix!)
This work had to be rebased on JDBetteridge/update caching #3730 and uses PyOP2 #724 and FInAT #134 due to the deadlocks that they call.

We need to consider what aspects of this experiment we want to incorporate back into master.

Some timings for the actual speed-up (the original intention):

Results

(Real only)

Master

This week's scheduled execution:

Total (inc install): 50m 45s

This branch

With fixed caches, mpispawn, fixed FInAT hashes and pytest-split based on a timed execution.
NB: We tweak vertexonly/test_poisson_inverse_conductivity.py to only do 3 iterations (see diff)

Serial: 17m51s
2: 2m59s
3: 6m43s
4: 45s
6: 19s
7: 48s
8: 12s
Total (inc install): 46m 6s

Important, this branch only runs a maximum of 12 ranks/threads!

connorjward · 2024-02-02T17:38:33Z

~~This is cool, but isn't it a bad idea to effectively remove test coverage? If CI doesn't run all the tests no one will.~~

~~I can see this being useful in the context of a bigger change where we run the test suite with a number of Firedrake configurations and only one of them would run these slow tests.~~

tests/regression/test_ensembleparallelism.py

.github/workflows/build.yml

github-actions · 2024-09-12T13:53:27Z

	Tests	Passed ✅	Skipped ⏭️	Failed ❌
Firedrake complex	8067 ran	6423 passed	1644 skipped	0 failed

github-actions · 2024-09-12T13:59:51Z

	Tests	Passed ✅	Skipped ⏭️	Failed ❌
Firedrake real	8042 ran	7224 passed	818 skipped	0 failed

connorjward

Generally very happy with this.

.github/workflows/build.yml

.test_durations

firedrake/parameters.py

firedrake/slate/slac/compiler.py

firedrake/tsfc_interface.py

tests/demos/test_demos_run.py

tests/output/test_io_mesh.py

tests/slate/test_hdg_poisson.py

.github/workflows/build.yml

firedrake/tsfc_interface.py

connorjward · 2024-10-18T14:25:23Z

.github/workflows/build.yml

          python "$(which firedrake-clean)"
          python -m pip install \
-            pytest-xdist pytest-timeout ipympl
+            pytest-xdist pytest-timeout ipympl pytest-split
+          pip install git+https://github.com/JDBetteridge/mpispawn


Before we merge this we should probably put this into the firedrakeproject organisation, or pin to a version or something?

.github/workflows/build.yml

connorjward

Leaving notes for someone (most likely me) to refer to in future. In summary:

Need to rebase/merge in master.
Tweaks to Makefile and build.yml.

connorjward · 2024-10-31T14:14:07Z

.github/workflows/build.yml

+            -o faulthandler_timeout=1860 \
+            --junit-xml=firedrake2_\$MPISPAWN_TASK_ID1.xml \
+            -m "parallel[\$MPISPAWN_WORLD_SIZE] and not broken" \
+            -v tests


TODO: "dogfood" (bleh) Makefile and use a matrix to massively cut down on boilerplate

connorjward · 2024-10-31T14:17:08Z

Makefile

+.PHONY: test_smoke
+test_smoke:
+	@echo "    Running the bare minimum smoke tests"
+	@python -m pytest -k "poisson_strong or stokes_mini or dg_advection" -v tests/regression/


It would be better to use MPI on the "outside" here for the parallel tests so this can be run to check things on HPC

connorjward · 2024-10-31T14:17:43Z

Makefile

-endif
+# Requires pytest and pytest-mpi only
+.PHONY: test_serial
+test_serial:


Terrible name! This runs all the parallel tests too!

connorjward · 2024-10-31T14:18:39Z

Makefile

+
+# Requires pytest and pytest-mpi only
+.PHONY: test_smoke
+test_smoke:


bikeshedding: I prefer make smoke_tests or make smoketests

connorjward · 2024-10-31T14:19:28Z

Makefile

+	done
+
+.PHONY: _test_large_world_test
+_test_large_world_tests:


I'm not sure why we have small_world and large_world tests separately.

connorjward · 2024-10-31T14:20:15Z

firedrake/slate/slac/kernel_builder.py

@@ -159,7 +159,11 @@ def collect_tsfc_kernel_data(self, mesh, tsfc_coefficients, tsfc_constants, wrap

        # Pick the constants associated with a Tensor()/TSFC kernel
        tsfc_constants = tuple(tsfc_constants[i] for i in kinfo.constant_numbers)
-        kernel_data.extend([(c, c.name) for c in wrapper_constants if c in tsfc_constants])
+        kernel_data.extend([


This needs to merge in master as these changes are now merged.

JDBetteridge added the DO NOT MERGE label Feb 2, 2024

JDBetteridge force-pushed the JDBetteridge/faster_tests branch from ecb9628 to f3685b7 Compare February 9, 2024 19:21

JDBetteridge force-pushed the JDBetteridge/faster_tests branch from f3685b7 to f848c6f Compare March 5, 2024 13:40

JDBetteridge force-pushed the JDBetteridge/faster_tests branch 2 times, most recently from 93a0aae to a20fc9d Compare June 7, 2024 13:52

JDBetteridge force-pushed the JDBetteridge/faster_tests branch from a5f614f to 27b9af3 Compare July 19, 2024 16:25

JDBetteridge force-pushed the JDBetteridge/faster_tests branch from 2a7f468 to 6ed1774 Compare August 18, 2024 13:44

JDBetteridge force-pushed the JDBetteridge/faster_tests branch 2 times, most recently from 8132b64 to 1122a3b Compare August 31, 2024 18:37

JDBetteridge added enhancement performance and removed DO NOT MERGE labels Sep 4, 2024

JDBetteridge self-assigned this Sep 4, 2024

JDBetteridge added the bug label Sep 4, 2024

JHopeCollins requested changes Sep 10, 2024

View reviewed changes

tests/regression/test_ensembleparallelism.py Show resolved Hide resolved

JDBetteridge commented Sep 11, 2024

View reviewed changes

.github/workflows/build.yml Outdated Show resolved Hide resolved

JDBetteridge force-pushed the JDBetteridge/faster_tests branch 2 times, most recently from 9d5f056 to df4aea3 Compare September 12, 2024 13:04

JDBetteridge marked this pull request as ready for review September 24, 2024 14:23

JDBetteridge force-pushed the JDBetteridge/faster_tests branch from bf317f5 to ef87021 Compare October 2, 2024 21:57

connorjward changed the title ~~Mark and skip slow tests~~ Test suite clean up Oct 3, 2024

connorjward reviewed Oct 3, 2024

View reviewed changes

JDBetteridge force-pushed the JDBetteridge/faster_tests branch from 8435a69 to 10f7f0c Compare October 8, 2024 14:02

JDBetteridge and others added 4 commits October 10, 2024 15:38

Draft changes to pyop2.caching

fcdeace

Change package branch

056ba30

Just notes

314ff8e

WIP

3f272d6

JDBetteridge and others added 11 commits October 10, 2024 15:38

Add slow tests back in

1b16177

Only do 3 iterations

40fe506

How did this ever work before!? (complex mode)

ea0ebf6

Try new FInAT hashes

fc7a2aa

Fix disk heckpointing test.

4d35c8d

linting test

def2665

Update .github/workflows/build.yml

33104ca

Change tests for reporting

7833adc

More smothing to improve solutions

ae6a442

Try to prevent pytest overwriting xml files in parallel

0f31713

Dog food flavoured makefile

f493334

JDBetteridge force-pushed the JDBetteridge/faster_tests branch from 10f7f0c to f493334 Compare October 10, 2024 14:38

JDBetteridge commented Oct 10, 2024

View reviewed changes

.github/workflows/build.yml Outdated Show resolved Hide resolved

Remove package branch

8cee812

JDBetteridge linked an issue Oct 10, 2024 that may be closed by this pull request

INSTALL: Tests not passing on fresh install M2 Mac #3793

Open

JDBetteridge commented Oct 15, 2024

View reviewed changes

firedrake/tsfc_interface.py Outdated Show resolved Hide resolved

JDBetteridge added 6 commits October 15, 2024 15:50

Error in loop on failure

c3dafc5

Give mesh session scope

b9205bb

Apply reviewers comments

9ec599e

Icosahedral radial mesh appears fixed?

2b8938c

Remove duplication from rebase

d26ea36

Test to see if Constant magically works again

fd3f337

JDBetteridge mentioned this pull request Oct 16, 2024

BUG: Use of constant in tests/slate/test_hdg_poisson.py::test_hdg_convergence causes errors #3802

Closed

connorjward reviewed Oct 18, 2024

View reviewed changes

JDBetteridge mentioned this pull request Oct 21, 2024

BUG: Performance regression on CI #3603

Open

Fix constant numbering in SLATE

200682f

JDBetteridge commented Oct 21, 2024

View reviewed changes

.github/workflows/build.yml Outdated Show resolved Hide resolved

JDBetteridge and others added 2 commits October 21, 2024 16:47

Mark another slow demo

6d8c93c

Stop using pytest-mpi branch

130b7b8

connorjward requested changes Oct 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test suite clean up #3385

Test suite clean up #3385

JDBetteridge commented Feb 2, 2024 •

edited

Loading

connorjward commented Feb 2, 2024 •

edited

Loading

github-actions bot commented Sep 12, 2024 •

edited

Loading

github-actions bot commented Sep 12, 2024 •

edited

Loading

connorjward left a comment

connorjward Oct 18, 2024

connorjward left a comment

connorjward Oct 31, 2024

connorjward Oct 31, 2024

connorjward Oct 31, 2024

connorjward Oct 31, 2024

connorjward Oct 31, 2024

connorjward Oct 31, 2024

Test suite clean up #3385

Are you sure you want to change the base?

Test suite clean up #3385

Conversation

JDBetteridge commented Feb 2, 2024 • edited Loading

Description

Results

Master

This branch

connorjward commented Feb 2, 2024 • edited Loading

github-actions bot commented Sep 12, 2024 • edited Loading

github-actions bot commented Sep 12, 2024 • edited Loading

connorjward left a comment

Choose a reason for hiding this comment

connorjward Oct 18, 2024

Choose a reason for hiding this comment

connorjward left a comment

Choose a reason for hiding this comment

connorjward Oct 31, 2024

Choose a reason for hiding this comment

connorjward Oct 31, 2024

Choose a reason for hiding this comment

connorjward Oct 31, 2024

Choose a reason for hiding this comment

connorjward Oct 31, 2024

Choose a reason for hiding this comment

connorjward Oct 31, 2024

Choose a reason for hiding this comment

connorjward Oct 31, 2024

Choose a reason for hiding this comment

JDBetteridge commented Feb 2, 2024 •

edited

Loading

connorjward commented Feb 2, 2024 •

edited

Loading

github-actions bot commented Sep 12, 2024 •

edited

Loading

github-actions bot commented Sep 12, 2024 •

edited

Loading