Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up nonsparse AD initial setup significantly #16089

Closed
wants to merge 2 commits into from

Conversation

lindsayad
Copy link
Member

We have been setting ADReal::do_derivatives=true by default and only
toggling to false during residual computation. This can be very bad
for non-sparse calculations. For instance in a navier-stokes input file
supplied by @gridley, I saw 10 seconds spent in initial condition
computation and 9 seconds in ComputeMaterialsObjectThread for a total
of 19 seconds in FEProblemBase::initialSetup. This was 27% of the
total computation time! With the changes here
FEProblemBase::initialSetup no longer even appears in the graph, which
is how it should be.

Graph with old toggling
nonsparse
Graph with new toggling
nonsparse-different-toggling

I recall that I was forced to do derivative calculations by default for
the reasons stated in the comment I'm deleting. My memory says some
phase field object was initializing its material properties in its
constructor, or more likely during initialSetup, so we had to enable
derivative calculations then or else its results would be garbage.
Hopefully, however, if there are objects still doing things like that we
can do a little more fine-grained control to make their objects work
without significantly sabotaging everyone's performance.

Refs #14701

aeslaughter
aeslaughter previously approved these changes Nov 5, 2020
@aeslaughter aeslaughter self-assigned this Nov 5, 2020
@lindsayad
Copy link
Member Author

Approving a PR with 1/3 tests failing. I like that. Ballsy

@aeslaughter
Copy link
Contributor

Approving a PR with 1/3 tests failing. I like that. Ballsy

I live life on the edge.

@moosebuild
Copy link
Contributor

moosebuild commented Nov 5, 2020

Job Documentation on ed82e42 wanted to post the following:

View the site here

This comment will be updated on new commits.

@lindsayad
Copy link
Member Author

@rwcarlsen what do you think about having lock_guards for this? I have them in b63dc3d but took them out in ddfb1fb. It feels silly to have it in the FEProblemBase sections. but maybe it makes sense to have them in the Coupleable sections and/or in other objects that have thread copies.

@lindsayad
Copy link
Member Author

Sigh...I'm actually pretty torn about what to do here. BISON for example initializes const ADReal with getParam<Real> values in object constructors. This is quite silly as it imposes AD math where there is clearly no need for it...but this is a pretty easy "mistake" for users to make. If ADReal::do_derivatives is false by default, then these types of ADReals will have uninitialized garbage in their derivative vectors when it comes time for jacobian computation.

Maybe I should continue to have the ADReal::do_derivatives be true most of the time, and just explicitly turn it off in things like ComputeInitialConditionThread and ComputeMaterialsObjectThread where we are less likely to hit user code that might be initializing AD quantities.

This is of course another place where a sparse container is much less problematic than the nonsparse one. If ADReal::do_derivatives = true and we invoke a dual number operation by accident there is far less penalty.

I could take some input from @rwcarlsen @friedmud and @fdkong perhaps

@lindsayad
Copy link
Member Author

Also @dschwen

@fdkong
Copy link
Contributor

fdkong commented Nov 5, 2020

Ideally, ADReal::do_derivatives = false for everything expect ComputeJacobian. We should figure out a way to track what objects are needed when computing the Jacobian matrix. However, it might be difficult to figure out the dependency.

@lindsayad
Copy link
Member Author

lindsayad commented Nov 5, 2020

Ideally, ADReal::do_derivatives = false for everything expect ComputeJacobian.

I totally agree. But it is hard to stop users from shooting themselves in the foot. For example look at what I have to do in 4c1a555. The developer who wrote ADPowerLawCreepStressUpdate isn't even doing anything dumb there. They only set _exponential to something other than 1 if _temperature is coupled. Given that I think it's reasonable for them to initialize to 1 in the constructor.

@lindsayad
Copy link
Member Author

lindsayad commented Nov 5, 2020

Yea @dschwen wrote that code and we're not going to get any user-developers more wily than he.

@dschwen
Copy link
Member

dschwen commented Nov 5, 2020

image

:-O

@dschwen
Copy link
Member

dschwen commented Nov 5, 2020

Can't you have the constructor of ADReal clear the derivative vector to zero even if ADReal::do_derivatives = false, or would the overhead be too large? The code you pointed to looked fine before the change but crazy unintuitive after the change :-/

@lindsayad
Copy link
Member Author

Ok:

FEProblemBase::initialSetup timings

  • no derivatives work, ADReal::do_derivatives = false: 1.59s, 2.94% of simulation
  • ADReal::do_derivatives = false but we zero derivatives vector when ADReal is constructed with a std::is_convertible<T2,Real> type: 3.28 s, 5.83% of simulation
  • all derivatives work, ADReal::do_derivatives = true: 19.46 s, 27.11% of simulation

lindsayad added a commit to roystgnr/MetaPhysicL that referenced this pull request Nov 5, 2020
lindsayad added a commit to libMesh/libmesh that referenced this pull request Nov 5, 2020
We have been setting `ADReal::do_derivatives=true` by default and only
toggling to `false` during residual computation. This can be very bad
for non-sparse calculations. For instance in a navier-stokes input file
supplied by Gavin Ridley, I saw 10 seconds spent in initial condition
computation and 9 seconds in `ComputeMaterialsObjectThread` for a total
of 19 seconds in `FEProblemBase::initialSetup`. This was 27% of the
total computation time! With the changes here
`FEProblemBase::initialSetup` no longer even appears in the graph, which
is how it should be.

I recall that I was forced to do derivative calculations by default for
the reasons stated in the comment I'm deleting. My memory says some
phase field object was initializing its material properties in its
constructor, or more likely during `initialSetup`, so we had to enable
derivative calculations then or else its results would be garbage.
Hopefully, however, if there are objects still doing things like that we
can do a little more fine-grained control to make their objects work
without significantly sabotaging everyone's performance.

Refs idaholab#14701
@lindsayad
Copy link
Member Author

That seems like a fair compromise of safety and speed. Let's see if changing that one DualNumber constructor is sufficient to catch all our MOOSE use cases...

@@ -1 +1 @@
Subproject commit 4f3fa5a6a2104ab8784a6519677589738b9aef6f
Subproject commit 23a208e65851b46dba8ebb2822147187d8b7fa4b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution! This contains a submodule update

@lindsayad
Copy link
Member Author

Now I'm just getting a crap-ton of valgrind errors out of parsed_function.h with zero help for a stack trace:

==3004428== Conditional jump or move depends on uninitialised value(s)
==3004428==    at 0x5D4AD74: norm (type_vector.h:946)
==3004428==    by 0x5D4AD74: NearestNodeThread::operator()(libMesh::StoredRange<__gnu_cxx::__normal_iterator<unsigned long*, std::vector<unsigned long, std::allocator<unsigned long> > >, unsigned long> const&) (NearestNodeThread.C:55)
==3004428==    by 0x5D5E13A: void libMesh::Threads::parallel_reduce<libMesh::StoredRange<__gnu_cxx::__normal_iterator<unsigned long*, std::vector<unsigned long, std::allocator<unsigned long> > >, unsigned long>, NearestNodeThread>(libMesh::StoredRange<__gnu_cxx::__normal_iterator<unsigned long*, std::vector<unsigned long, std::allocator<unsigned long> > >, unsigned long> const&, NearestNodeThread&) (threads_pthread.h:380)
==3004428==    by 0x5D4C0EF: NearestNodeLocator::findNodes() (NearestNodeLocator.C:176)
==3004428==    by 0x5D54CD8: GeometricSearchData::update(GeometricSearchData::GeometricSearchType) (GeometricSearchData.C:66)
==3004428==    by 0x53BCF16: DisplacedProblem::updateMesh(bool) (DisplacedProblem.C:249)
==3004428==    by 0x53D877B: FEProblemBase::computeResidualTags(std::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> > const&) (FEProblemBase.C:5324)
==3004428==    by 0x539F93E: FEProblemBase::computeResidualInternal(libMesh::NumericVector<double> const&, libMesh::NumericVector<double>&, std::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> > const&) (FEProblemBase.C:5222)
==3004428==    by 0x53968ED: FEProblemBase::computeResidualSys(libMesh::NonlinearImplicitSystem&, libMesh::NumericVector<double> const&, libMesh::NumericVector<double>&) (FEProblemBase.C:5154)
==3004428==    by 0x71D9731: libmesh_petsc_snes_mffd_residual (petsc_nonlinear_solver.C:267)
==3004428==    by 0x71DA1FF: libmesh_petsc_snes_mffd_interface (petsc_nonlinear_solver.C:300)
==3004428==    by 0x96482C5: MatMult_MFFD (mffd.c:384)
==3004428==    by 0x9706F17: MatMult_Shell (shell.c:1068)
==3004428==  Uninitialised value was created by a stack allocation
==3004428==    at 0x55CDD4B: libMesh::ParsedFunction<double, libMesh::VectorValue<double> >::partial_reparse(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (parsed_function.h:523)

@lindsayad
Copy link
Member Author

I could not have less clue looking through the function parser code where DualNumbers might actually be getting constructed

@lindsayad
Copy link
Member Author

lindsayad commented Nov 6, 2020

I'm guessing this is the "problematic" code:

ADReal
ADFParser::Eval(const ADReal * vars)
{
  mooseAssert(compiledFunction, "ADFParser objects must be JIT compiled before evaluation!");
  ADReal ret;
  (*reinterpret_cast<CompiledFunctionPtr<ADReal>>(compiledFunction))(&ret, vars, pImmed, _epsilon);
  return ret;
}

Default initialization as opposed to value initialization (which is what I would want generally speaking). and then probably assignment later, probably copy or move assignment as opposed to ADReal ret = some_real. Am I going to have to undo roystgnr/MetaPhysicL#34 ? That seems backwards.

@lindsayad
Copy link
Member Author

Valgrind getting fooled by optmizations. In debug mode, I get the right stack traces:

==3106187==  Uninitialised value was created by a heap allocation
==3106187==    at 0x483C583: operator new[](unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3106187==    by 0x8AA40FB: MooseArray<libMesh::VectorValue<MetaPhysicL::DualNumber<double, MetaPhysicL::NumberArray<50ul, double>, true> > >::resize(unsigned int, libMesh::VectorValue<MetaPhysicL::DualNumber<double, MetaPhysicL::NumberArray<50ul, double>, true> > const&) (MooseArray.h:250)
==3106187==    by 0x8A4DC68: FEProblemBase::updateMaxQps() (FEProblemBase.C:4651)
==3106187==    by 0x8A4E2E3: FEProblemBase::createQRules(libMesh::QuadratureType, libMesh::Order, libMesh::Order, libMesh::Order, unsigned short) (FEProblemBase.C:4695)
==3106187==    by 0x97BD2F5: SetupQuadratureAction::act() (SetupQuadratureAction.C:62)
==3106187==    by 0x9783C29: Action::timedAct() (Action.C:93)
==3106187==    by 0x9788E17: ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (ActionWarehouse.C:380)
==3106187==    by 0x97888DD: ActionWarehouse::executeAllActions() (ActionWarehouse.C:341)
==3106187==    by 0x9E6D539: MooseApp::runInputFile() (MooseApp.C:893)
==3106187==    by 0x9E6E6A3: MooseApp::run() (MooseApp.C:1058)
==3106187==    by 0x11E637: main (main.C:36)

Default constructor for TypeVector:

template <typename T>
inline
TypeVector<T>::TypeVector ()
{
  _coords[0] = {};

#if LIBMESH_DIM > 1
  _coords[1] = {};
#endif

#if LIBMESH_DIM > 2
  _coords[2] = {};
#endif
}

So this would require applying the same patch to DualNumber move-assignment as I did to DualNumber construct-from-scalar. And then if I'm applying to that, maybe I should apply it all assignment operations...and all construction operations as well. This seems more and more repugnant to me. We created that static do_derivatives flag to really not do derivatives at the MetaPhysicL level, and now we're talking about doing derivatives sometimes and sometimes not irrespective of whether the flag is false. All this is making me lean towards simply closing this PR and pushing more and more towards #16091

@lindsayad lindsayad closed this Nov 6, 2020
@tophmatthews
Copy link
Contributor

const ADReal with getParam<Real> values in object constructors.

I help clean those up if you point them out, they're probably mine...

@rwcarlsen
Copy link
Contributor

@lindsayad - objects with thread copies are the only places that don't need to use the lock guard. I'd maybe try making a function like this - basically trying to encapsulate the mutex and logic required for taking the lock, etc:

void FEProblemBase::withDerivativesAs(bool do_derivatives, std::function<()> func)
{
  std::lock_guard<std::mutex> guard(_do_derivatives_mutex);
  func();
}

Then basically use this function everywhere - explicitly doing operations with it set a particular way. But I haven't really dug into all the code and details here - so I'll trust what you end up doing makes the most sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants