Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability to handle with node variables #67

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

bennibolm
Copy link
Contributor

@bennibolm bennibolm commented Aug 10, 2023

This PR adds the functionality to handle node-level data beside the solution itself.
This is required to visualize subcell limiting coefficients, as used in #1476 and #1502.

It comes together with the PR to Trixi, which saves the coefficients in the .h5 files.

TODO:

  • Adapt test to "new" testing structure (compare value and not only check whether the code runs through). Check this after the Trixi PR is merged.

Question:

  • Useful to allow reinterpolation for subcell blending coefficients? Actually not, but it would be very annoying to always handle two files. Added a comment.

@bennibolm bennibolm closed this Oct 23, 2023
@bennibolm bennibolm reopened this Oct 23, 2023
@bennibolm
Copy link
Contributor Author

The current implementation supports both reinterpolation and no reinterpolation for the subcell limiting coefficient. That's the way it is handled for the solution itself.
Does it even make sense to allow reinterpolation for this coefficient? What do you think? @amrueda @andrewwinters5000 @sloede

I already prepared an PR to Trixi2Vtk_reference_files to add the needed .vtu files for testing. I will open the PR when this discussion here has been decided.

@sloede
Copy link
Member

sloede commented Oct 23, 2023

Does it even make sense to allow reinterpolation for this coefficient?

IMHO, no. It's like an AMR marker (should this cell be refined or not), which is not a continuous field that happens to be represented by a piecewise-constant approximation. Thus reinterpoaltion does not give a meaningful representation of reality but can actually be harmful.

Having said that, I am not sure how easy/annoying it is to always create two files, once once with reinterpolation (for the solution itself) and once without (for the node variables). It might thus be nice to have the ability for reinterpolation, just so you can have only one VTK file for a quick'n'dirty visualization.

What we really need/want though, is a ParaView plugin that can read in our HDF5 natively without us having to convert to VTK first. But that's for another day I guess :-)

@amrueda
Copy link

amrueda commented Oct 23, 2023

It might thus be nice to have the ability for reinterpolation, just so you can have only one VTK file for a quick'n'dirty visualization.

I agree with @sloede

@bennibolm
Copy link
Contributor Author

bennibolm commented Oct 23, 2023

Does it even make sense to allow reinterpolation for this coefficient?

IMHO, no. It's like an AMR marker (should this cell be refined or not), which is not a continuous field that happens to be represented by a piecewise-constant approximation. Thus reinterpoaltion does not give a meaningful representation of reality but can actually be harmful.

I thought so too. But I implemented the reinterpolation of the coefficients before it was possible to not reinterpolate the solution and just kept the code after the update.

Having said that, I am not sure how easy/annoying it is to always create two files, once once with reinterpolation (for the solution itself) and once without (for the node variables). It might thus be nice to have the ability for reinterpolation, just so you can have only one VTK file for a quick'n'dirty visualization.

I'm not sure that I understand your thoughts correctly. I don't think we need two different files here either way. It should be enough to just delete the implication of parameter reinterpolation for the subcell stuff.

EDIT: Oh, I now see the problem. I didn't realized that it's a problem to save reinterpolated and non-reinterpolated node-level data in one .vtu file.

What we really need/want though, is a ParaView plugin that can read in our HDF5 natively without us having to convert to VTK first. But that's for another day I guess :-)

😌

@amrueda
Copy link

amrueda commented Oct 23, 2023

I'm not sure that I understand your thoughts correctly. I don't think we need two different files here either way. It should be enough to just delete the implication of parameter reinterpolation for the subcell stuff.

No, I don't think that's a good idea. In some situations, it might still be advantageous to reinterpolate the solution variables to get a nice and smooth representation of the (part of the field that is a pure) DG solution.

@bennibolm
Copy link
Contributor Author

bennibolm commented Oct 23, 2023

Having said that, I am not sure how easy/annoying it is to always create two files, once once with reinterpolation (for the solution itself) and once without (for the node variables). It might thus be nice to have the ability for reinterpolation, just so you can have only one VTK file for a quick'n'dirty visualization.

I'm not sure that I understand your thoughts correctly. I don't think we need two different files here either way. It should be enough to just delete the implication of parameter reinterpolation for the subcell stuff.

No, I don't think that's a good idea. In some situations, it might still be advantageous to reinterpolate the solution variables to get a nice and smooth representation of the (part of the field that is a pure) DG solution.

After understanding the problem correctly 😅 , I will keep the current implementation for now and add a warning in that case, if that's okay for you.

@andrewwinters5000
Copy link
Member

I also think that it not very "meaningful" to reinterpolate the node-wise alpha values because it could lead to misinterpretation (as already pointed out by the others). For the element-wise variant we can get away with this re-interpolation because the alphas are constant within each element, so it does not change the values.

Having said that, I am not sure how easy/annoying it is to always create two files, once once with reinterpolation (for the solution itself) and once without (for the node variables). It might thus be nice to have the ability for reinterpolation, just so you can have only one VTK file for a quick'n'dirty visualization.

I'm not sure that I understand your thoughts correctly. I don't think we need two different files here either way. It should be enough to just delete the implication of parameter reinterpolation for the subcell stuff.

I think that @sloede is correct. It would be annoying but you could create two separate files where the solution re-interpolates but the alphas do not. It would be something like (note I have not tested this)

trixi2vtk(joinpath("out", "solution_*.h5"), output_directory="out")
trixi2vtk(joinpath("out", "solution_*_celldata.h5"), output_directory="out", reinterpolate=false)

The second call might need to use ? instead of the * for the different files numbers.

@bennibolm
Copy link
Contributor Author

So, to avoid the annoying part with two files, I just added a comment if reinterpolation=true is called with a subcell coefficient.

For the testing, I added two new tests. After running elixir_euler_shockcapturing_subcell.jl I check for reinterpolation=true and false. I would suggest, that I first request a review on the current changes here and then, after every is okay here, I open a PR to Trixi2Vtk_reference_files to add the files and (hopefully) fix both new tests.

@bennibolm bennibolm marked this pull request as ready for review October 25, 2023 08:58
@bennibolm bennibolm requested a review from sloede October 25, 2023 08:59
@bennibolm bennibolm removed the request for review from sloede February 1, 2024 14:18
@bennibolm
Copy link
Contributor Author

bennibolm commented Apr 22, 2024

After a long time, I remembered this PR and think it is important to merge it into main. Otherwise, no visualization of the subcell limiting coefficient is possible.
I think after the discussion above, it's acceptable to merge the current version where also the subcell limiting coefficient is reinterpolated when enabled with reinterpolate=true (giving a warning). Though, for creating vtu files with subcell limiting reinterpolate=false should be used normally.
Doing this automatically would require two files and two parallel calls of trixi2vtk.

@bennibolm bennibolm requested a review from sloede April 22, 2024 16:05
test/test_2d.jl Outdated Show resolved Hide resolved
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

src/convert.jl Outdated Show resolved Hide resolved
src/convert.jl Outdated Show resolved Hide resolved
test/test_2d.jl Outdated Show resolved Hide resolved
@bennibolm
Copy link
Contributor Author

bennibolm commented Jun 21, 2024

@sloede I adapted the hash number TEST_REFERENCE_COMMIT to the current commit in Trixi2Vtk_reference_files. So, the tests are actually testing now. But two issues now:

  1. Some tests fail due to differences in the files. This is probably because I used julia 1.10 for creating the files. Is there a useful way to just take the file the CI created (like the artifacts folder when running locally)? And if so, where? In the past, I had so many failing tests because of different machines/versions/..., such that I don't feel confident with just creating the reference files locally with julia 1.9.

  2. In the test with julia version 1.7 the following error occurs:
    SystemError: opening file "/home/runner/.julia/packages/Trixi/enFLf/examples/tree_2d_dgsem/elixir_euler_sedov_blast_wave_sc_subcell.jl": No such file or directory. Does it use an old version of Trixi without any subcell limiting elixirs? I haven't found a possible reason for that yet.
    Also, for julia 1.7 only the 2d tests are running. Probably, when finding the reason for that, also explains why it gives the error above.

@sloede
Copy link
Member

sloede commented Jun 21, 2024

Re 1: Here you should be able to get the artifacts files for older runs: https://github.com/trixi-framework/Trixi2Vtk.jl/actions/runs/9328145281. I am not 100% sure how to get them if the initial run fails - maybe try to make the test pass artificially, download the reference files, and then update them? I know this is a finicky business, but nobody has yet put in the time for a proper testing setup for Trixi2Vtk.jl... if you're up for it, we could sit together and try to come up with some strategies on how to mitigate this 😊

Re 2: Yes, Julia v1.7 heavily downgrades Trixi.jl. We switched to a minimum of Julia v1.8 over a year ago (https://github.com/trixi-framework/Trixi.jl/blame/06407f44041a04352800b7add4b20a9ac235f4ae/Project.toml), so your tests probably require Juila v1.8 or later. I'd just put the tests in an if condition for the Julia version to check if that fixes it.

@bennibolm
Copy link
Contributor Author

Re 1: Here you should be able to get the artifacts files for older runs: https://github.com/trixi-framework/Trixi2Vtk.jl/actions/runs/9328145281. I am not 100% sure how to get them if the initial run fails - maybe try to make the test pass artificially, download the reference files, and then update them?

Okay thanks. It seems like the artifacts are saved anywhere. So, I can directly download the files from https://github.com/trixi-framework/Trixi2Vtk.jl/actions/runs/9614040983.

I know this is a finicky business, but nobody has yet put in the time for a proper testing setup for Trixi2Vtk.jl... if you're up for it, we could sit together and try to come up with some strategies on how to mitigate this 😊

I'm not sure if I want to spend time on that 😶

Re 2: Yes, Julia v1.7 heavily downgrades Trixi.jl. We switched to a minimum of Julia v1.8 over a year ago (https://github.com/trixi-framework/Trixi.jl/blame/06407f44041a04352800b7add4b20a9ac235f4ae/Project.toml), so your tests probably require Juila v1.8 or later. I'd just put the tests in an if condition for the Julia version to check if that fixes it.

Okay thanks. Will do it.

@sloede
Copy link
Member

sloede commented Jun 24, 2024

I know this is a finicky business, but nobody has yet put in the time for a proper testing setup for Trixi2Vtk.jl... if you're up for it, we could sit together and try to come up with some strategies on how to mitigate this 😊

I'm not sure if I want to spend time on that 😶

I totally understand. Just keep in mind that all this infrastructure around Trixi.jl, including pre-/postprocessing was built by someone who probably should have been doing other things. Not saying it needs to be you, but things like Trixi2Vtk.jl don't build themselves (yet 😬).

@bennibolm
Copy link
Contributor Author

It worked with downloading the reference file from the artifact page. Unfortunately, the test fails on macOS.
I downloaded that vtu file from the last run of macOS as well, and sadly it seems like they differ significantly and the test only run through with very high tolerances.

julia> # downloaded file for macOS
julia> mac_file = "/home/bbolm/Downloads/test-output-files-1.9-macOS-latest-x64(1)/2d-treemesh-shockcapturing-subcell-reinterp-solution_000010.vtu"

julia> # current reference file (downloaded from ubuntu)
julia> ubuntu_file = "/home/bbolm/Downloads/test-output-files-1.9-ubuntu-latest-x64(1)/2d-treemesh-shockcapturing-subcell-reinterp-solution_000010.vtu"

julia> compare_cell_data(out_file, ref_file, atol=1.0e-12, rtol=1.0e-8)
ERROR: There was an error during testing

julia> compare_cell_data(out_file, ref_file, atol=1.0e-12, rtol=1.0e-3)
ERROR: There was an error during testing

julia> compare_cell_data(out_file, ref_file, atol=1.0e-12, rtol=1.0e-2)

julia> compare_cell_data(out_file, ref_file, atol=1.0e-2, rtol=1.0e-3)
ERROR: There was an error during testing

julia> compare_cell_data(out_file, ref_file, atol=1.0e-1, rtol=1.0e-3)

So, the abs tolerance has to be about 1.0e-1 or the rel tol 1e-3. Can this be correct somehow?
At this point I'd rather remove the macOS test than use such tolerances...

And for the test without reinterpolation almost the same. There the tests work with atol=1.0e-2 or rtol=1.0e-2.

julia> mac_file = "/home/bbolm/Downloads/test-output-files-1.9-macOS-latest-x64(1)/2d-treemesh-shockcapturing-subcell-no-reinterp-solution_000010.vtu"

julia> ubuntu_file = "/home/bbolm/Downloads/test-output-files-1.9-ubuntu-latest-x64(1)/2d-treemesh-shockcapturing-subcell-no-reinterp-solution_000010.vtu"

I don't really know what to do with that 😕 @sloede What so you think?

@sloede
Copy link
Member

sloede commented Jun 24, 2024

Do the differences in the solution between macOS and Ubuntu already exist before converting the files to VTK? That is, is the 1e-4 difference already present in the Trixi.jl solution (files)? And did you experience something similar for the Trixi.jl tests on macOS?

@bennibolm
Copy link
Contributor Author

Do the differences in the solution between macOS and Ubuntu already exist before converting the files to VTK? That is, is the 1e-4 difference already present in the Trixi.jl solution (files)? And did you experience something similar for the Trixi.jl tests on macOS?

That's a good question. No, within CI runs, I never really experienced differences between linux and macos in Trixi.jl before. What I experienced was, that the subcell limiting itself is very sensitive to small changed (per construction). So, small differences can grow bigger and bigger over time.
I just looked into the differences, and they occur for all variables (rho, v1, ...) and not only for the limiting coefficient.

Note: Since even the tests without any reinterpolation are failing for macOS, where all data is only copied and reshaped within Trixi2Vtk, it is not only a problem here but maybe in the compression in between.

I'm no expert on the Vtk files, but for me, it looks like the vtk file saves the data with type UInt8. routine here

julia> vtk = VTKFile(out_filename).appended_data
21262-element Vector{UInt8}
[...]

Then, in get_point_data it is converted back to Float64.

julia> out_point_data = get_point_data(vtk)
julia> out_data = get_data(out_point_data[variable_name])
julia> out_data
1024-element reinterpret(Float64, ::Vector{UInt8}):
[...]

Is it possible that subcell limiting produces slightly different solutions on the different machines, which grow bigger due to the compression in the vtu file? Although, the largest difference for rho is about 3.0e-3, which is quite large 🤔

@sloede
Copy link
Member

sloede commented Jun 27, 2024

Is it possible that subcell limiting produces slightly different solutions on the different machines, which grow bigger due to the compression in the vtu file? Although, the largest difference for rho is about 3.0e-3, which is quite large

The compression is lossless, so I don't really think that this is the reason.

Note: Since even the tests without any reinterpolation are failing for macOS, where all data is only copied and reshaped within Trixi2Vtk, it is not only a problem here but maybe in the compression in between.

But this indicates that the difference is already there in the Trixi.jl solution, i.e., before Trixi2Vtk touches it, doesn't it?

@bennibolm
Copy link
Contributor Author

Note: Since even the tests without any reinterpolation are failing for macOS, where all data is only copied and reshaped within Trixi2Vtk, it is not only a problem here but maybe in the compression in between.

But this indicates that the difference is already there in the Trixi.jl solution, i.e., before Trixi2Vtk touches it, doesn't it?

Yes, that's true. I just took a look into the ci file in Trixi. Do I get this correctly that we don't run the "normal" tests on macos and windows? Only the threaded and mpi tests.
If so, that wasn't clear to me. Then, of course, it is highly likely that already in Trixi the results are quite different on macOS.
(When looking at the new PR including tests with julia 1.11 (with errors of about 1e-4) and remembering that we already had to adapt the tests for the move to 1.10, it is even more likely. Of course, both of it for linux)

In the past, I always had this issue, mostly with the blast elixirs. Not really sure why that is. Do you think I sould spend some time on investigating this? I mean, the method itself is very sensitive to small changes by construction.
A short-term fix for this current PR here, could be to accept the differences and exclude the test from ci for macOS. What do you think? @sloede

@sloede
Copy link
Member

sloede commented Jul 3, 2024

Do I get this correctly that we don't run the "normal" tests on macos and windows? Only the threaded and mpi tests.

Yes. Since fewer macOS runners are available to us on the free plan, we only run some basic tests on macOS and assume that everything else "just works" as well. Originally, we tested everything everywhere, but this just wasn't feasible anymore due to long CI runtimes.

Do you think I sould spend some time on investigating this? I mean, the method itself is very sensitive to small changes by construction.

This is something you need to discuss with your advisor, if it is worth to investigate right now (maybe bring it up on Friday?). IMHO it is at least somewhat unfortunate that floating point truncation errors seem to have such a significant impact already. It indicates to me that either there's something wrong with the method itself (unlikely), the implementation (less unlikely, but still), or that the setups are not robust enough (my current favorite). By robust I mean that maybe you are not running a stable simulation but rather simulations that are just not crashing. I wonder if choosing slightly different parameters would allow you to alleviate the issue (thus confirming my suspicion) or whether it is really by design.

A short-term fix for this current PR here, could be to accept the differences and exclude the test from ci for macOS.

Yes. But please leave a short comment explaining the reason for this and x-ref this PR (#67)

@bennibolm
Copy link
Contributor Author

IMHO it is at least somewhat unfortunate that floating point truncation errors seem to have such a significant impact already. It indicates to me that either there's something wrong with the method itself (unlikely), the implementation (less unlikely, but still), or that the setups are not robust enough (my current favorite). By robust I mean that maybe you are not running a stable simulation but rather simulations that are just not crashing. I wonder if choosing slightly different parameters would allow you to alleviate the issue (thus confirming my suspicion) or whether it is really by design.

Yes, I agree. I took a quick look into it and made a list of failing tests. I will add this here but will present this also on friday:
Subcell Limiting in Trixi:

• Failing tests for julia 1.11
    ◦ elixir_euler_blast_wave_sc_subcell_nonperiodic.jl 
      initial_condition_blast_wave
      local limiting of rho and entropy_math
      initial refinement level = 4 (in test)
      Dirichlet Boundary Conditions
    ◦ elixir_euler_sedov_blast_wave_sc_subcell.jl
      initial_condition_sedov_blast_wave
      local limiting of rho and entropy_guermond_etal
      initial_refinement_level=4
      periodic
    ◦ elixir_eulermulti_shock_bubble_shockcapturing_subcell_minmax.jl
      initial_condition_shock_bubble
      local limiting of rhos
      initial refinement level = 3
      periodic
• Working tests:
    ◦ elixir_euler_shockcapturing_subcell.jl (Positivity of rho, initial_refinement_level = 5)
    ◦ elixir_euler_kelvin_helmholtz_instability_sc_subcell.jl (Positivity of rho and p, in_refine_level=5)
    ◦ elixir_eulermulti_shock_bubble_shockcapturing_subcell_positivity (Positivity of rhos, in_ref_level=3)
    ◦ elixir_mhd_shockcapturing_subcell (Positivity of rho and p, initial_refinement_level=4)
    ◦ elixir_euler_sedov_blast_wave_sc_subcell (StructuredMesh, Positivity of Rho and p, 16x16)
    ◦ elixir_euler_sedov_blast_wave_sc_subcell (StructuredMesh, Local limiting of rho and entropy_guermond, 16x16)
• Used setup in tests of Trixi2Vtk
    ◦ elixir_euler_sedov_blast_wave_sc_subcell.jl (see above, local limiting, within failing tests for switch to julia 1.10 and 1.11)
• Switch to julia 1.10
    ◦ elixir_euler_sedov_blast_wave_sc_subcell.jl
    ◦ elixir_eulermulti_shock_bubble_shockcapturing_subcell_minmax.jl

After that (but of course this must be investigated further) I would assume your are right with:

[...] or that the setups are not robust enough (my current favorite). By robust I mean that maybe you are not running a stable simulation but rather simulations that are just not crashing. I wonder if choosing slightly different parameters would allow you to alleviate the issue (thus confirming my suspicion) [...]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants