Speed up nonsparse AD initial setup significantly #16089

lindsayad · 2020-11-05T17:30:34Z

We have been setting ADReal::do_derivatives=true by default and only
toggling to false during residual computation. This can be very bad
for non-sparse calculations. For instance in a navier-stokes input file
supplied by @gridley, I saw 10 seconds spent in initial condition
computation and 9 seconds in ComputeMaterialsObjectThread for a total
of 19 seconds in FEProblemBase::initialSetup. This was 27% of the
total computation time! With the changes here
FEProblemBase::initialSetup no longer even appears in the graph, which
is how it should be.

Graph with old toggling

Graph with new toggling

I recall that I was forced to do derivative calculations by default for
the reasons stated in the comment I'm deleting. My memory says some
phase field object was initializing its material properties in its
constructor, or more likely during initialSetup, so we had to enable
derivative calculations then or else its results would be garbage.
Hopefully, however, if there are objects still doing things like that we
can do a little more fine-grained control to make their objects work
without significantly sabotaging everyone's performance.

Refs #14701

lindsayad · 2020-11-05T18:46:25Z

Approving a PR with 1/3 tests failing. I like that. Ballsy

aeslaughter · 2020-11-05T19:07:03Z

Approving a PR with 1/3 tests failing. I like that. Ballsy

I live life on the edge.

moosebuild · 2020-11-05T19:22:08Z

Job Documentation on ed82e42 wanted to post the following:

View the site here

This comment will be updated on new commits.

lindsayad · 2020-11-05T20:31:50Z

@rwcarlsen what do you think about having lock_guards for this? I have them in b63dc3d but took them out in ddfb1fb. It feels silly to have it in the FEProblemBase sections. but maybe it makes sense to have them in the Coupleable sections and/or in other objects that have thread copies.

lindsayad · 2020-11-05T20:52:29Z

Sigh...I'm actually pretty torn about what to do here. BISON for example initializes const ADReal with getParam<Real> values in object constructors. This is quite silly as it imposes AD math where there is clearly no need for it...but this is a pretty easy "mistake" for users to make. If ADReal::do_derivatives is false by default, then these types of ADReals will have uninitialized garbage in their derivative vectors when it comes time for jacobian computation.

Maybe I should continue to have the ADReal::do_derivatives be true most of the time, and just explicitly turn it off in things like ComputeInitialConditionThread and ComputeMaterialsObjectThread where we are less likely to hit user code that might be initializing AD quantities.

This is of course another place where a sparse container is much less problematic than the nonsparse one. If ADReal::do_derivatives = true and we invoke a dual number operation by accident there is far less penalty.

I could take some input from @rwcarlsen @friedmud and @fdkong perhaps

lindsayad · 2020-11-05T20:52:52Z

Also @dschwen

fdkong · 2020-11-05T21:24:08Z

Ideally, ADReal::do_derivatives = false for everything expect ComputeJacobian. We should figure out a way to track what objects are needed when computing the Jacobian matrix. However, it might be difficult to figure out the dependency.

lindsayad · 2020-11-05T21:32:10Z

Ideally, ADReal::do_derivatives = false for everything expect ComputeJacobian.

I totally agree. But it is hard to stop users from shooting themselves in the foot. For example look at what I have to do in 4c1a555. The developer who wrote ADPowerLawCreepStressUpdate isn't even doing anything dumb there. They only set _exponential to something other than 1 if _temperature is coupled. Given that I think it's reasonable for them to initialize to 1 in the constructor.

lindsayad · 2020-11-05T21:35:51Z

Yea @dschwen wrote that code and we're not going to get any user-developers more wily than he.

dschwen · 2020-11-05T21:49:25Z

:-O

dschwen · 2020-11-05T21:52:34Z

Can't you have the constructor of ADReal clear the derivative vector to zero even if ADReal::do_derivatives = false, or would the overhead be too large? The code you pointed to looked fine before the change but crazy unintuitive after the change :-/

lindsayad · 2020-11-05T22:43:21Z

Ok:

FEProblemBase::initialSetup timings

no derivatives work, ADReal::do_derivatives = false: 1.59s, 2.94% of simulation
ADReal::do_derivatives = false but we zero derivatives vector when ADReal is constructed with a std::is_convertible<T2,Real> type: 3.28 s, 5.83% of simulation
all derivatives work, ADReal::do_derivatives = true: 19.46 s, 27.11% of simulation

Refs idaholab/moose#16089

We have been setting `ADReal::do_derivatives=true` by default and only toggling to `false` during residual computation. This can be very bad for non-sparse calculations. For instance in a navier-stokes input file supplied by Gavin Ridley, I saw 10 seconds spent in initial condition computation and 9 seconds in `ComputeMaterialsObjectThread` for a total of 19 seconds in `FEProblemBase::initialSetup`. This was 27% of the total computation time! With the changes here `FEProblemBase::initialSetup` no longer even appears in the graph, which is how it should be. I recall that I was forced to do derivative calculations by default for the reasons stated in the comment I'm deleting. My memory says some phase field object was initializing its material properties in its constructor, or more likely during `initialSetup`, so we had to enable derivative calculations then or else its results would be garbage. Hopefully, however, if there are objects still doing things like that we can do a little more fine-grained control to make their objects work without significantly sabotaging everyone's performance. Refs idaholab#14701

lindsayad · 2020-11-05T22:48:45Z

That seems like a fair compromise of safety and speed. Let's see if changing that one DualNumber constructor is sufficient to catch all our MOOSE use cases...

moosebuild · 2020-11-05T22:54:24Z

libmesh

@@ -1 +1 @@
-Subproject commit 4f3fa5a6a2104ab8784a6519677589738b9aef6f
+Subproject commit 23a208e65851b46dba8ebb2822147187d8b7fa4b


Caution! This contains a submodule update

lindsayad · 2020-11-06T00:26:24Z

Now I'm just getting a crap-ton of valgrind errors out of parsed_function.h with zero help for a stack trace:

==3004428== Conditional jump or move depends on uninitialised value(s)
==3004428==    at 0x5D4AD74: norm (type_vector.h:946)
==3004428==    by 0x5D4AD74: NearestNodeThread::operator()(libMesh::StoredRange<__gnu_cxx::__normal_iterator<unsigned long*, std::vector<unsigned long, std::allocator<unsigned long> > >, unsigned long> const&) (NearestNodeThread.C:55)
==3004428==    by 0x5D5E13A: void libMesh::Threads::parallel_reduce<libMesh::StoredRange<__gnu_cxx::__normal_iterator<unsigned long*, std::vector<unsigned long, std::allocator<unsigned long> > >, unsigned long>, NearestNodeThread>(libMesh::StoredRange<__gnu_cxx::__normal_iterator<unsigned long*, std::vector<unsigned long, std::allocator<unsigned long> > >, unsigned long> const&, NearestNodeThread&) (threads_pthread.h:380)
==3004428==    by 0x5D4C0EF: NearestNodeLocator::findNodes() (NearestNodeLocator.C:176)
==3004428==    by 0x5D54CD8: GeometricSearchData::update(GeometricSearchData::GeometricSearchType) (GeometricSearchData.C:66)
==3004428==    by 0x53BCF16: DisplacedProblem::updateMesh(bool) (DisplacedProblem.C:249)
==3004428==    by 0x53D877B: FEProblemBase::computeResidualTags(std::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> > const&) (FEProblemBase.C:5324)
==3004428==    by 0x539F93E: FEProblemBase::computeResidualInternal(libMesh::NumericVector<double> const&, libMesh::NumericVector<double>&, std::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> > const&) (FEProblemBase.C:5222)
==3004428==    by 0x53968ED: FEProblemBase::computeResidualSys(libMesh::NonlinearImplicitSystem&, libMesh::NumericVector<double> const&, libMesh::NumericVector<double>&) (FEProblemBase.C:5154)
==3004428==    by 0x71D9731: libmesh_petsc_snes_mffd_residual (petsc_nonlinear_solver.C:267)
==3004428==    by 0x71DA1FF: libmesh_petsc_snes_mffd_interface (petsc_nonlinear_solver.C:300)
==3004428==    by 0x96482C5: MatMult_MFFD (mffd.c:384)
==3004428==    by 0x9706F17: MatMult_Shell (shell.c:1068)
==3004428==  Uninitialised value was created by a stack allocation
==3004428==    at 0x55CDD4B: libMesh::ParsedFunction<double, libMesh::VectorValue<double> >::partial_reparse(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (parsed_function.h:523)

lindsayad · 2020-11-06T00:30:26Z

I could not have less clue looking through the function parser code where DualNumbers might actually be getting constructed

lindsayad · 2020-11-06T01:06:04Z

~~I'm guessing this is the "problematic" code:~~

ADReal
ADFParser::Eval(const ADReal * vars)
{
  mooseAssert(compiledFunction, "ADFParser objects must be JIT compiled before evaluation!");
  ADReal ret;
  (*reinterpret_cast<CompiledFunctionPtr<ADReal>>(compiledFunction))(&ret, vars, pImmed, _epsilon);
  return ret;
}

Default initialization as opposed to value initialization (which is what I would want generally speaking). and then probably assignment later, probably copy or move assignment as opposed to ADReal ret = some_real. Am I going to have to undo roystgnr/MetaPhysicL#34 ? That seems backwards.

lindsayad · 2020-11-06T01:50:26Z

Valgrind getting fooled by optmizations. In debug mode, I get the right stack traces:

==3106187==  Uninitialised value was created by a heap allocation
==3106187==    at 0x483C583: operator new[](unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3106187==    by 0x8AA40FB: MooseArray<libMesh::VectorValue<MetaPhysicL::DualNumber<double, MetaPhysicL::NumberArray<50ul, double>, true> > >::resize(unsigned int, libMesh::VectorValue<MetaPhysicL::DualNumber<double, MetaPhysicL::NumberArray<50ul, double>, true> > const&) (MooseArray.h:250)
==3106187==    by 0x8A4DC68: FEProblemBase::updateMaxQps() (FEProblemBase.C:4651)
==3106187==    by 0x8A4E2E3: FEProblemBase::createQRules(libMesh::QuadratureType, libMesh::Order, libMesh::Order, libMesh::Order, unsigned short) (FEProblemBase.C:4695)
==3106187==    by 0x97BD2F5: SetupQuadratureAction::act() (SetupQuadratureAction.C:62)
==3106187==    by 0x9783C29: Action::timedAct() (Action.C:93)
==3106187==    by 0x9788E17: ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (ActionWarehouse.C:380)
==3106187==    by 0x97888DD: ActionWarehouse::executeAllActions() (ActionWarehouse.C:341)
==3106187==    by 0x9E6D539: MooseApp::runInputFile() (MooseApp.C:893)
==3106187==    by 0x9E6E6A3: MooseApp::run() (MooseApp.C:1058)
==3106187==    by 0x11E637: main (main.C:36)

Default constructor for TypeVector:

template <typename T>
inline
TypeVector<T>::TypeVector ()
{
  _coords[0] = {};

#if LIBMESH_DIM > 1
  _coords[1] = {};
#endif

#if LIBMESH_DIM > 2
  _coords[2] = {};
#endif
}

So this would require applying the same patch to DualNumber move-assignment as I did to DualNumber construct-from-scalar. And then if I'm applying to that, maybe I should apply it all assignment operations...and all construction operations as well. This seems more and more repugnant to me. We created that static do_derivatives flag to really not do derivatives at the MetaPhysicL level, and now we're talking about doing derivatives sometimes and sometimes not irrespective of whether the flag is false. All this is making me lean towards simply closing this PR and pushing more and more towards #16091

tophmatthews · 2020-11-06T02:37:10Z

const ADReal with getParam<Real> values in object constructors.

I help clean those up if you point them out, they're probably mine...

rwcarlsen · 2020-11-09T20:41:27Z

@lindsayad - objects with thread copies are the only places that don't need to use the lock guard. I'd maybe try making a function like this - basically trying to encapsulate the mutex and logic required for taking the lock, etc:

void FEProblemBase::withDerivativesAs(bool do_derivatives, std::function<()> func)
{
  std::lock_guard<std::mutex> guard(_do_derivatives_mutex);
  func();
}

Then basically use this function everywhere - explicitly doing operations with it set a particular way. But I haven't really dug into all the code and details here - so I'll trust what you end up doing makes the most sense.

aeslaughter previously approved these changes Nov 5, 2020

View reviewed changes

aeslaughter self-assigned this Nov 5, 2020

lindsayad dismissed aeslaughter’s stale review via ddfb1fb November 5, 2020 19:24

lindsayad mentioned this pull request Nov 5, 2020

Make global AD indexing with sparse container MOOSE default config #16091

Closed

lindsayad added a commit to roystgnr/MetaPhysicL that referenced this pull request Nov 5, 2020

Test derivative zeroing, regardless of do_derivatives

e3cd252

Refs idaholab/moose#16089

lindsayad added a commit to libMesh/libmesh that referenced this pull request Nov 5, 2020

Test derivative zeroing in MetaPhysicl

23a208e

Refs idaholab/moose#16089

lindsayad force-pushed the speedup-nonsparse branch from 4c1a555 to 17e76a5 Compare November 5, 2020 22:47

lindsayad added 2 commits November 5, 2020 14:47

Update libmesh for testing deriv zeroing

ed82e42

lindsayad force-pushed the speedup-nonsparse branch from 17e76a5 to ed82e42 Compare November 5, 2020 22:47

moosebuild reviewed Nov 5, 2020

View reviewed changes

lindsayad closed this Nov 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up nonsparse AD initial setup significantly #16089

Speed up nonsparse AD initial setup significantly #16089

lindsayad commented Nov 5, 2020

lindsayad commented Nov 5, 2020

aeslaughter commented Nov 5, 2020

moosebuild commented Nov 5, 2020 •

edited

Loading

lindsayad commented Nov 5, 2020

lindsayad commented Nov 5, 2020

lindsayad commented Nov 5, 2020

fdkong commented Nov 5, 2020

lindsayad commented Nov 5, 2020 •

edited

Loading

lindsayad commented Nov 5, 2020 •

edited

Loading

dschwen commented Nov 5, 2020

dschwen commented Nov 5, 2020 •

edited

Loading

lindsayad commented Nov 5, 2020

lindsayad commented Nov 5, 2020

moosebuild Nov 5, 2020

lindsayad commented Nov 6, 2020

lindsayad commented Nov 6, 2020

lindsayad commented Nov 6, 2020 •

edited

Loading

lindsayad commented Nov 6, 2020

tophmatthews commented Nov 6, 2020

rwcarlsen commented Nov 9, 2020

		@@ -1 +1 @@
		Subproject commit 4f3fa5a6a2104ab8784a6519677589738b9aef6f
		Subproject commit 23a208e65851b46dba8ebb2822147187d8b7fa4b

Speed up nonsparse AD initial setup significantly #16089

Speed up nonsparse AD initial setup significantly #16089

Conversation

lindsayad commented Nov 5, 2020

lindsayad commented Nov 5, 2020

aeslaughter commented Nov 5, 2020

moosebuild commented Nov 5, 2020 • edited Loading

lindsayad commented Nov 5, 2020

lindsayad commented Nov 5, 2020

lindsayad commented Nov 5, 2020

fdkong commented Nov 5, 2020

lindsayad commented Nov 5, 2020 • edited Loading

lindsayad commented Nov 5, 2020 • edited Loading

dschwen commented Nov 5, 2020

dschwen commented Nov 5, 2020 • edited Loading

lindsayad commented Nov 5, 2020

lindsayad commented Nov 5, 2020

moosebuild Nov 5, 2020

Choose a reason for hiding this comment

lindsayad commented Nov 6, 2020

lindsayad commented Nov 6, 2020

lindsayad commented Nov 6, 2020 • edited Loading

lindsayad commented Nov 6, 2020

tophmatthews commented Nov 6, 2020

rwcarlsen commented Nov 9, 2020

moosebuild commented Nov 5, 2020 •

edited

Loading

lindsayad commented Nov 5, 2020 •

edited

Loading

lindsayad commented Nov 5, 2020 •

edited

Loading

dschwen commented Nov 5, 2020 •

edited

Loading

lindsayad commented Nov 6, 2020 •

edited

Loading