Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use separate reference images per backend #3520

Merged
merged 19 commits into from
Jan 1, 2024
Merged

Conversation

jkrumbiegel
Copy link
Member

The git repo approach is a bit complex to set up, and it has the additional issue that you cannot update or see images that haven't failed. So for now I'm putting that on hold.

So in this PR, each backend creates its own subfolder for reference images. The special glmakie tests are just merged into a single set with the general tests by allowing to specify more than one set of reference tests at once:

@include_reference_tests GLMakie "refimages.jl" joinpath(@__DIR__, "glmakie_refimages.jl")

Therefore, there is just one set of refimages left called reference_images.tar. I could already upload this tar file to the 0.20 release for testing of this PR.

Because we don't fail if there are reference images for which no tests exist, it should not matter to each individual backend that the reference image tar contains data for all backends at once. We can merge the three recorded folders into one after all backend tests have run, this artifact could then be used to later rewrite the reference updater to be able to update all three backends at once.

@MakieBot
Copy link
Collaborator

MakieBot commented Dec 28, 2023

Compile Times benchmark

Note, that these numbers may fluctuate on the CI servers, so take them with a grain of salt. All benchmark results are based on the mean time and negative percent mean faster than the base branch. Note, that GLMakie + WGLMakie run on an emulated GPU, so the runtime benchmark is much slower. Results are from running:

using_time = @ctime using Backend
# Compile time
create_time = @ctime fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @ctime Makie.colorbuffer(display(fig))
# Runtime
create_time = @benchmark fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @benchmark Makie.colorbuffer(fig)
using create display create display
GLMakie 3.43s (3.39, 3.46) 0.02+- 383.45ms (380.10, 386.73) 2.37+- 467.51ms (463.04, 471.84) 3.33+- 7.17ms (7.09, 7.27) 0.08+- 25.30ms (25.26, 25.33) 0.03+-
master 3.42s (3.39, 3.44) 0.01+- 383.97ms (381.16, 385.52) 1.39+- 473.62ms (465.78, 512.24) 17.07+- 7.17ms (7.09, 7.25) 0.05+- 25.30ms (25.22, 25.40) 0.06+-
evaluation 1.00x invariant, 0.02s (0.77d, 0.18p, 0.02std) 1.00x invariant, -0.51ms (-0.26d, 0.63p, 1.88std) 1.01x invariant, -6.11ms (-0.50d, 0.39p, 10.20std) 1.00x invariant, -0.0ms (-0.01d, 0.99p, 0.06std) 1.00x invariant, -0.0ms (-0.07d, 0.90p, 0.04std)
CairoMakie 2.99s (2.96, 3.00) 0.02+- 314.53ms (311.77, 317.79) 2.08+- 139.05ms (137.54, 140.00) 0.77+- 6.93ms (6.85, 7.00) 0.06+- 600.33μs (594.11, 606.79) 4.07+-
master 2.99s (2.97, 3.02) 0.02+- 312.84ms (310.24, 314.07) 1.57+- 138.59ms (137.36, 139.60) 0.81+- 6.99ms (6.93, 7.07) 0.05+- 594.94μs (590.65, 602.85) 4.23+-
evaluation 1.00x invariant, -0.0s (-0.11d, 0.84p, 0.02std) 0.99x invariant, 1.7ms (0.92d, 0.11p, 1.83std) 1.00x invariant, 0.46ms (0.59d, 0.29p, 0.79std) 1.01x invariant, -0.06ms (-1.16d, 0.05p, 0.05std) 0.99x slower X, 5.39μs (1.30d, 0.03p, 4.15std)
WGLMakie 3.72s (3.63, 3.83) 0.06+- 456.66ms (320.41, 563.02) 122.30+- 9.20s (9.07, 9.29) 0.08+- 9.80ms (9.27, 12.47) 1.18+- 73.03ms (68.00, 79.90) 5.32+-
master 3.75s (3.72, 3.78) 0.02+- 327.11ms (321.68, 334.87) 4.37+- 9.18s (9.11, 9.28) 0.05+- 9.33ms (9.17, 9.47) 0.11+- 72.34ms (67.57, 79.39) 4.75+-
evaluation 1.01x invariant, -0.03s (-0.59d, 0.31p, 0.04std) 0.72x slower❌, 129.54ms (1.50d, 0.03p, 63.33std) 1.00x invariant, 0.01s (0.16d, 0.78p, 0.07std) 0.95x invariant, 0.47ms (0.56d, 0.33p, 0.65std) 0.99x invariant, 0.69ms (0.14d, 0.80p, 5.03std)

@jkrumbiegel
Copy link
Member Author

jkrumbiegel commented Dec 29, 2023

This PR works now, what remains is to update the 17 missing images using ReferenceUpdater and to adjust the score threshold to something that works for us. Right now, a lot of tests fail because I generated the reference images on my macbook and the threshold is set quite low with 0.001 (for every backend, because they all have their own images now). Especially WGLMakie seems to have differences here and there, mostly they look like very slight shifts. Sometimes it looks like z fighting mess, like here:
grafik
grafik
So @SimonDanisch and @ffreyer it would probably be good if you checked out the images as well, to see if something clearly needs to be fixed. Otherwise we can update the reference_images.tar through the ReferenceUpdater run from this PR's branch, and then merge whenever we like.
Ah and we should probably exchange all mp4s with stepper pngs, because mp4s introduce additional noise through their compression which might work slightly differently depending on host system.

@jkrumbiegel
Copy link
Member Author

@DanielVandH this PR now only fails with a triplot reference, but only on 1.6. I can't see the generated image right now because those artifacts aren't saved but this must be due to RNG differences I think. Any idea what could be going on?

@DanielVandH
Copy link
Contributor

@jkrumbiegel it would have to be due to an RNG issue if anything. Were any new tests inserted in between the existing triangulation tests since they were added? That would disrupt it.

@jkrumbiegel
Copy link
Member Author

@DanielVandH I don't think so, no. The tests work fine on Julia 1 but not on 1.6. But the point of this PR was to increase the test sensitivity so it's totally possible the 1.6 version was always broken, but just flew under the radar. Aren't we using stable rng so that we don't have the problem of different random numbers in different julia versions? Maybe there's one aspect of that plotting function that doesn't fully comply with that, at least currently I have no other hypothesis.

@DanielVandH
Copy link
Contributor

The tests work fine on Julia 1 but not on 1.6

Is this always the case? It's possible that the RNG only matters for a few triangles and so isn't always seen (probably not, though).

If there were no other uses of the STABLE_RNG in between that would change the RNG order, then I guess it might've always just been broken.. I'd have to see the images to consider it in more detail.

@jkrumbiegel
Copy link
Member Author

I'd have to see the images to consider it in more detail

I've just pushed a change so that 1.6 also stores its reference images as an artifact again, I thought we didn't need it and was immediately proven wrong :D

@jkrumbiegel
Copy link
Member Author

jkrumbiegel commented Jan 1, 2024

Ok here we go @DanielVandH, reference on 1

grafik

and recorded on 1.6

grafik

There are differences at the top border

@DanielVandH
Copy link
Contributor

Weird.. there are more points being inserted onto the boundary. That means the RNG for the mesh refinement is not working or changed somehow. I can try and take a look, but I haven't really had time for any coding this past month so I'm not sure when I can take a proper look. This test is not too crucial though, other tests also use a custom bounding box / holes which seems to be what this test was for, in case you want to just remove it.

@jkrumbiegel
Copy link
Member Author

Ok I'll temporarily disable it then, thanks!

@jkrumbiegel jkrumbiegel merged commit 3f6f0f7 into master Jan 1, 2024
17 checks passed
@jkrumbiegel jkrumbiegel deleted the jk/reftests-per-backend branch January 1, 2024 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants