Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spalart Allmaras Test Failing #622

Open
pbauman opened this issue Nov 25, 2022 · 11 comments
Open

Spalart Allmaras Test Failing #622

pbauman opened this issue Nov 25, 2022 · 11 comments

Comments

@pbauman
Copy link
Member

pbauman commented Nov 25, 2022

Assertion `g < _variable_groups.size()' failed.
g = 0
_variable_groups.size() = 0


Stack frames: 11
0: libMesh::print_trace(std::ostream&)
1: libMesh::MacroFunctions::report_error(char const*, int, char const*, char const*)
2: libMesh::DofMap::variable_group(unsigned int) const
3: libMesh::System::get_info[abi:cxx11]() const
4: libMesh::EquationSystems::get_info[abi:cxx11]() const
5: libMesh::EquationSystems::print_info(std::ostream&) const
6: GRINS::Simulation::print_sim_info()
7: GRINS::Simulation::run()
8: /home/pbauman/work/grins/build/master/devel/test/.libs/test_turbulent_channel() [0x412db7]
9: __libc_start_main
10: /home/pbauman/work/grins/build/master/devel/test/.libs/test_turbulent_channel() [0x413d9e]
[0] ./include/libmesh/dof_map.h, line 1943, compiled Nov 24 2022 at 11:13:00

git bisect points at libMesh/libmesh#3407, specifically libMesh/libmesh@b663ec5375

I haven't dug any deeper, at least wanted to capture the git bisect results.

@pbauman
Copy link
Member Author

pbauman commented Nov 25, 2022

Looks like Variables are getting dropped.

Input:

[Variables]
   [./Velocity]
      names = 'u v'
      fe_family = 'LAGRANGE'
      order = 'FIRST'
   [../Pressure]
      names = 'p'
      fe_family = 'LAGRANGE'
      order = 'FIRST'
   [../TurbulentViscosity]
      names = 'nu'
      fe_family = 'LAGRANGE'
      order = 'FIRST'
[]

Program echo:

 EquationSystems
  n_systems()=1
   System #0, "flow"
    Type "Implicit"
    Variables={ "u" "nutil" }
    Finite Element Types="LAGRANGE", "JACOBI_20_00"
    Infinite Element Mapping="CARTESIAN"
    Approximation Orders="FIRST", "THIRD"
    n_dofs()=662
    n_local_dofs()=662
    n_constrained_dofs()=0
    n_local_constrained_dofs()=0
    n_vectors()=2
    n_matrices()=1
    DofMap Sparsity
      Average  On-Processor Bandwidth <= 9.96375
      Average Off-Processor Bandwidth <= 0
      Maximum  On-Processor Bandwidth <= 10
      Maximum Off-Processor Bandwidth <= 0
    DofMap Constraints
      Number of DoF Constraints = 0
      Number of Node Constraints = 0

@pbauman
Copy link
Member Author

pbauman commented Nov 25, 2022

Just to be clear, I'm not sure if this is a libMesh problem or a GRINS problem yet. Could be libMesh exposed a bad API usage or something.

@roystgnr
Copy link
Member

It's definitely at least a libMesh problem; at the very least we should be catching an error earlier and reporting it more informatively. So it's either a libMesh problem or a both problem, and I fear I know which of those Occam would pick...

@roystgnr
Copy link
Member

That is an astonishing bisect result, though. I can't see anything in that commit that I would expect to be remotely relevant. I'll try to replicate this weekend.

@pbauman
Copy link
Member Author

pbauman commented Nov 26, 2022

That is an astonishing bisect result, though.

Agreed, was a serious Whiskey Tango Foxtrot moment for me.

I've reproduced it by hand though. Using 54021ccfb of libMesh and GRINS tests pass (assuming you have the fix for the GhostingFunctor::map_type as well). Using b663ec537 of libMesh and that one test fails.

I do not know what is special about that test. We have plenty of others that should group together variables like this one. It's the only turbulence one, though, which is what makes me suspect this is a "both" problem.

I'm on Ubuntu 20.04 and here's my modules environment:

Currently Loaded Modules:
  1) gcc/11.2.0   2) openblas/0.3.20   3) mpich/4.0.1   4) petsc/3.16.6-opt   5) hdf5/1.8.21   6) boost/1.66.0   7) vtk/9.1.0   8) cppunit/1.15.1

@pbauman
Copy link
Member Author

pbauman commented Nov 26, 2022

My comment about variables getting dropped was incorrect. That was the info from the EquationSystems used for projecting the boundary values. We're dying the in print_info of the main system. I'm trying to step down into a debugger to compare what's happening between the failing and non-failing cases.

@pbauman
Copy link
Member Author

pbauman commented Nov 26, 2022

OK, it's something with the System associated with the DistanceFunction

@pbauman
Copy link
Member Author

pbauman commented Nov 26, 2022

So for the DistanceFunction system, the System._varaiable_groups and System.get_dof_map()._variable_groups are out of sync. The former had a size of 1 and the latter is empty.

@pbauman
Copy link
Member Author

pbauman commented Nov 26, 2022

And the variable groups are out-of-sync because libMesh::System::init_data() it not getting called on the distance function system.

@pbauman
Copy link
Member Author

pbauman commented Nov 26, 2022

Ohhhhh, that is sneaky. So here is what's going on, I'm not sure what the correct fix is. This seems to be a flaw with the Turbulence class hierarchy and possibly GRINS more generally.

So SpalartAllmaras overrides init_variables to, among other things, create a DistanceFunction object. It does it there because it needs access to the FEMSystem and EquationSystems. However, DistanceFunction itself is a subclass of System (well libMesh::System::Initialization) and in the construction of DistanceFunction, it adds itself to the EquationSystems.

The issue is that the call stack for this it initiated at equation_systems.C:93, i.e. a loop over the Systems present inside EquationSystems. What was happening before the libMesh PR is that n_systems() went up from 1 to 2, but in the for-loop, n_systems() was re-evaluated at each iteration of the loop, so the loop proceeded to the next iteration. After the PR, the make_range function creates IntRange types that takes the n_systems() value by value so it doesn't see the increment in n_systems().

I would argue that is not libMesh's fault and this has been broken on the GRINS side this entire time.

Now to think about the proper fix.

@pbauman
Copy link
Member Author

pbauman commented Nov 27, 2022

It seems to me we need a hook, Multiphysics::pre_init() and then subsequently Physics::pre_init() (or whatever better names there are) so that something like DistanceFunction can add itself to the EquationSystems and then be properly init'ed. Happy to hear other suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants