[Coro] Amortize debug info processing cost in CoroSplit #109032

artempyanykh · 2024-09-17T19:20:44Z

More details about this stack are in this topic on discourse.

TL;DR: In large modules with a bunch of coroutines the cost of debug info metadata processing can get very high (e.g. adding over a minute to the compile time). This stack does some refactoring around function cloning and caches some module-level calculations to speed up the pass.

Anecdata for a sample C++ source file:

	Baseline / 0	IdentityMD set	Prebuilt GlobalDI	Cached CU DIFinder
CoroSplitPass	306ms	221ms	68ms	17ms
CoroCloner	101ms	72ms	63ms	0.5ms
CollectGlobalDI	-	-	63ms	13ms
Overall speed up	1x	1.4x	4.5x	18x

Each commit is individually buildable and reviewable. I can extract them into separate PRs if the overall direction makes sense.

Summary: A helper (2 overloads) that consolidates corocloner creation and the actual cloning. The helpers create a TimeTraceScope to make it easier to see how long the cloning takes. Test Plan: ninja check-llvm-unit

Summary: Consolidate the logic in a single function. We do an extra pass over Instructions but this is necessary to untangle things and extract metadata cloning in a future diff. Test Plan: ninja check-llvm-unit

Summary: There was a single usage of CloneBasicBlock with non-default DebugInfoFinder inside CloneFunctionInto which has been refactored in the previous diff. Let's remove the parameter to keep the scope of the function more focused. Test Plan: ninja check-llvm-unit

…neFunctionInto Summary: This makes the flow of the function a bit more straightforward and makes it easier to extract more into helper functions. Test Plan: ninja check-llvm-unit

Summary: Extract the logic to build up a metadta map to use in metadata cloning into a separate function. Test Plan: ninja check-llvm-unit

Summary: One potentially contentious point in the new function's API is that it expects the caller ot populate the VMap rather than doing metadata cloning wholesale inside it. We'll need it this way for a subsequent change. Test Plan: ninja check-llvm-unit

Summary: This and previously extracted functions will be used in a later diff. Test Plan: ninja check-llvm-unit

Summary: Previously, we'd add all SPs distinct from the cloned one into a set. Then when cloning a local scope we'd check if it's from one of those 'distinct' SPs by checking if it's in the set. We don't need to do that. We can just check against the cloned SP directly and drop the set. Test Plan: ninja check-llvm-unit

Summary: The typedef was there probably because the type alias in ValueMap was private. Test Plan: ninja check-llvm-unit

Summary: To avoid cloning 'global' debug info, CloneFunction implementation used to eagerly identity map a known subset of global debug into into ValueMap's MD map. In larger modules with meaningful volume of debug info this gets very expensive. By passing such global metadata via an IdentityMD set for the ValueMapper to map on first use, we get several benefits: 1. Mapping metadata is not cheap, particularly because of tracking. When cloning a Function we identity map lots of global module-level metadata to avoid cloning it, while only a fraction of it is actually used by the function. Mapping on first use is a lot faster for modules with meaningful amount of debug info. 2. Eagerly identity mapping metadata makes it harder to cache module-level data (e.g. a set of metadata nodes in a \a DICompileUnit). With this patch we can cache certain module-level metadata calculations to speed things up further. Anecdata from compiling a sample cpp file with full debug info shows that this moderately speeds up CoroSplitPass which is one of the heavier users of cloning: | | Baseline | IdentityMD set | |-----------------+----------+----------------| | CoroSplitPass | 306ms | 221ms | | CoroCloner | 101ms | 72ms | |-----------------+----------+----------------| | Speed up | 1x | 1.4x | Test Plan: ninja check-llvm-unit ninja check-llvm

…utine clones Summary: CoroCloner, by calling into CloneFunctionInto, does a lot of repeated work priming DIFinder and building a list of global debug info metadata. For programs compiled with full debug info this gets very expensive. This diff builds the data once and shares it between all clones. Anecdata for a sample cpp source file compiled with full debug info: | | Baseline | IdentityMD set | Prebuilt GlobalDI (cur.) | |-----------------+----------+----------------+--------------------------| | CoroSplitPass | 306ms | 221ms | 68ms | | CoroCloner | 101ms | 72ms | 0.5ms | | CollectGlobalDI | - | - | 63ms | |-----------------+----------+----------------+--------------------------| | Speed up | 1x | 1.4x | 4.5x | Note that CollectGlobalDI happens once *per coroutine* rather than per clone. Test Plan: ninja check-llvm-unit ninja check-llvm Compiled a sample internal source file, checked time trace output for scope timings.

Summary: The analysis simply primes and caches DebugInfoFinders for each DICompileUnit in a module. This allows (future) callers like CoroSplitPass to compute global debug info metadata (required for coroutine function cloning) much faster. Specifically, pay the price of DICompileUnit processing only once per compile unit, rather than once per coroutine. Test Plan: Added a smoke test for the new analysis ninja check-llvm-unit

Summary: We can use a DebugInfoFinder from DebugInfoCache which is already primed on a compile unit to speed up collection of global debug info. The pass could likely be another 2x+ faster if we avoid rebuilding the set of global debug info. This needs further massaging of CloneFunction and ValueMapper, though, and can be done incrementally on top of this. Comparing performance of CoroSplitPass at various points in this stack, this is anecdata from a sample cpp file compiled with full debug info: | | Baseline | IdentityMD set | Prebuilt GlobalDI | Cached CU DIFinder (cur.) | |-----------------+----------+----------------+-------------------+---------------------------| | CoroSplitPass | 306ms | 221ms | 68ms | 17ms | | CoroCloner | 101ms | 72ms | 0.5ms | 0.5ms | | CollectGlobalDI | - | - | 63ms | 13ms | |-----------------+----------+----------------+-------------------+---------------------------| | Speed up | 1x | 1.4x | 4.5x | 18x | Test Plan: ninja check-llvm-unit ninja check-llvm Compiled a sample cpp file with time trace to get the avg. duration of the pass and inner scopes.

github-actions · 2024-09-17T19:24:51Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Analysis/DebugInfoCache.cpp

llvm/include/llvm/Analysis/DebugInfoCache.h

adrian-prantl

This sounds like generally a good idea and the implementation looks clean as well.
What are the behavior changes we expect from this patch series?

artempyanykh · 2024-09-27T10:51:47Z

This sounds like generally a good idea and the implementation looks clean as well.

Thanks for reviewing @adrian-prantl!

What are the behavior changes we expect from this patch series?

I expect no functional changes, only faster compile times for code with coroutines.
The new DebugInfoCache analysis is not conditioned on the presence of coroutines in a module but it's fast and the analysis results can be used to speed up function cloning in other places than just coroutines.
I ran a compile time benchmark on a few thousand internal source files:
- There were no perf regressions, code without coroutines had neutral perf impact.
- Source files that use coroutines compiled 7% faster on avg, but there were many cases where it was 25%, 50% and even 65% faster. The larger the module is and the more coroutines it has, the greater the speed up.

felipepiovezan · 2024-10-01T12:28:34Z

Thanks for working on this and for making each commit reviewable / commitable on its own, that's really great work!
I will start looking at them asap, wish I had seen this sooner! :)

artempyanykh · 2024-10-01T17:18:30Z

Thanks @felipepiovezan! There's some commentary around the commits in this discourse topic, in case it's useful.

artempyanykh · 2024-10-15T13:55:27Z

@adrian-prantl, @felipepiovezan have you had a chance to take a closer look? Could you advise on what would be the best way to proceed with reviewing this PR?

adrian-prantl

@felipepiovezan Can you apply this patch to the LLVM in swiftlang and run the tests with it as a sanity check?

felipepiovezan · 2024-10-17T16:52:32Z

Hi @artempyanykh, while I follow Adrian's suggestions, you could extract the first commit into a separate PR and we'll start reviewing/merging the NFC stuff one by one

felipepiovezan · 2024-10-17T16:56:11Z

Hi @artempyanykh, while I follow Adrian's suggestions, you could extract the first commit into a separate PR and we'll start reviewing/merging the NFC stuff one by one

The second commit can also be a PR on its own right now, it doesn't depend on the first one AFAICT, so we can review them in parallel

felipepiovezan · 2024-10-17T17:03:51Z

llvm/lib/Transforms/Utils/CloneFunction.cpp

-
-  // Copy all attributes other than those stored in the AttributeList.  We need
-  // to remap the parameter indices of the AttributeList.
+// Copy all attributes other than those stored in the AttributeList.  We need


I'm not sure this comment is making a lot of sense as a function-level comment. To be honest even the original comment was confusing: which AttributeList? If you know the answer to this, could you take the chance provided by this NFC commit to improve the comment?

This second commit could also use a brief commit message describing why this change is useful

I think the comment refers to the AttributeList field (the only such) of a Function which has e.g. parameters and return value attributes. I updated the comment in #112976 to (hopefully) make it a bit clearer.

felipepiovezan · 2024-10-17T17:07:33Z

llvm/lib/Transforms/Utils/CloneFunction.cpp

+  DISubprogram *SPClonedWithinModule = nullptr;
+  if (Changes < CloneFunctionChangeType::DifferentModule) {
+    SPClonedWithinModule = F.getSubprogram();
+  }


LLVM's coding guidelines prescribe not using {} around single line blocks

felipepiovezan · 2024-10-17T17:11:42Z

llvm/include/llvm/Transforms/Utils/Cloning.h

@@ -205,6 +205,11 @@ void CloneAndPruneFunctionInto(Function *NewFunc, const Function *OldFunc,
                               const char *NameSuffix = "",
                               ClonedCodeInfo *CodeInfo = nullptr);

+/// Process debug information from function's subprogram attachment.
+DISubprogram *ProcessSubprogramAttachment(const Function &F,


Could you elaborate on "Process"? This is more about collecting debug info, right?

That's right. This function hydrates the passed DebugInfoFinder by visiting relevant components of the function's subprogram attachment. I'll update the name/comment when we get to extracting that commit in a separate PR.

Sounds good, now that the other two PRs are open & approved, let's proceed with this commit!

Great, yes! @felipepiovezan a silly question -- now that those PRs are approved, how/when do they get merged? Stacking unmerged PRs on top of each other doesn't have the best UX in Github.

So I think the best option here is to merge the two PRs that are approved (#112976 and #112948) . Both of those stand on their own and don't depend on each other, so it's fine to merge in w/e order, right?

And then you can grab the third commit (or more, if the subsequent commits are also independent of each other) of this PR and open a new PR (or multiple PRs, if you did the () before), so it will look fine on top of main. My understanding is that the fourth commit depends on the third though.

Regarding what to do about this umbrella PR in the mean time, you have two options: leave this as is (with the understanding that it is not meant to be merged, merely serve as a reference for all the work in the proposal), or rebase on top of main, which will get rid of the two-already merged commits and force push to your branch artempyanykh:fast-coro-upstream, which will mean this PR serves a reference of the work that's left to be merged, also with the understand that we're not merging it.

With all that stack, we'd never have to stack unmerged PRs on top of each other, does that make sense?

By the way, do you have commit access?

So I think the best option here is to merge the two PRs that are approved (#112976 and #112948) . Both of those stand on their own and don't depend on each other, so it's fine to merge in w/e order, right?

That's right, yes.

And then you can grab the third commit (or more, if the subsequent commits are also independent of each other) of this PR and open a new PR (or multiple PRs, if you did the () before), so it will look fine on top of main. My understanding is that the fourth commit depends on the third though.

After the first 2 are merged, extracting the third one in a separate PR is straightforward. With the rest of the stack there's quite a bit of sequencing, so we may need to rinse and repeat.

rebase on top of main, which will get rid of the two-already merged commits and force push to your branch artempyanykh:fast-coro-upstream, which will mean this PR serves a reference of the work that's left to be merged, also with the understand that we're not merging it

+1 to this. I was planning to keep rebasing this stack anyway to make sure things keep working end-to-end while the chunks are being extracted and merged.

By the way, do you have commit access?

No, not sure what the process for this is. Why?

Sorry for the delay, the LLVM dev conference happened last week and we were mostly out.

No, not sure what the process for this is. Why?

Just wondering if you needed help merging the open PRs. I've merged the first two for you.
The process for getting commit access is described here: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access
If you don't want to do that, you can always let the reviewers know you need someone to press the merge button for you

artempyanykh · 2024-10-18T20:43:58Z

@felipepiovezan I extracted the first 2 commits into #112948 and #112976.

A helper (2 overloads) that consolidates corocloner creation and the actual cloning. The helpers create a TimeTraceScope to make it easier to see how long the cloning takes. Extracted from #109032 (commit 1)

…to (#112976) This patch is a part of step-by-step refactoring of CloneFunctionInto. The goal is to extract reusable pieces out of it that will be later used to optimize function cloning e.g. in coroutine processing. Extracted from #109032 (commit 2)

artempyanykh added 14 commits September 15, 2024 11:56

[NFC][Coro] Add helpers for coro cloning with a TimeTraceScope

a987688

Summary: A helper (2 overloads) that consolidates corocloner creation and the actual cloning. The helpers create a TimeTraceScope to make it easier to see how long the cloning takes. Test Plan: ninja check-llvm-unit

[NFC][Utils] Extract CloneFunctionAttributesInto from CloneFunctionInto

3cb662c

[Utils] Extract ProcessSubprogramAttachment from CloneFunctionInto

7b6c39f

Summary: Consolidate the logic in a single function. We do an extra pass over Instructions but this is necessary to untangle things and extract metadata cloning in a future diff. Test Plan: ninja check-llvm-unit

[NFC][Utils] Clone basic blocks after we're done with metadata in Clo…

06c10b1

…neFunctionInto Summary: This makes the flow of the function a bit more straightforward and makes it easier to extract more into helper functions. Test Plan: ninja check-llvm-unit

[NFC][Utils] Extract BuildDebugInfoMDMap from CloneFunctionInto

fad2d5e

Summary: Extract the logic to build up a metadta map to use in metadata cloning into a separate function. Test Plan: ninja check-llvm-unit

[NFC][Utils] Extract CloneFunctionBodyInto from CloneFunctionInto

b268200

Summary: This and previously extracted functions will be used in a later diff. Test Plan: ninja check-llvm-unit

[NFC] Remove adhoc definition of MDMapT in IRMover

0eefa6a

Summary: The typedef was there probably because the type alias in ValueMap was private. Test Plan: ninja check-llvm-unit

dcci requested review from adrian-prantl, dwblaikie, nikic and echristo September 24, 2024 01:30

WenleiHe requested review from apolloww and ayermolo September 24, 2024 16:55

adrian-prantl requested a review from felipepiovezan September 26, 2024 17:28

adrian-prantl reviewed Sep 26, 2024

View reviewed changes

llvm/lib/Analysis/DebugInfoCache.cpp Outdated Show resolved Hide resolved

adrian-prantl reviewed Sep 26, 2024

View reviewed changes

llvm/include/llvm/Analysis/DebugInfoCache.h Show resolved Hide resolved

adrian-prantl reviewed Sep 26, 2024

View reviewed changes

[Analysis] Add a doxygen comment to DebugInfoCache

f2ec90a

adrian-prantl reviewed Oct 15, 2024

View reviewed changes

felipepiovezan reviewed Oct 17, 2024

View reviewed changes

This was referenced Oct 18, 2024

[NFC][Coro] Add helpers for coro cloning with a TimeTraceScope #112948

Merged

[NFC][Utils] Extract CloneFunctionAttributesInto from CloneFunctionInto #112976

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Coro] Amortize debug info processing cost in CoroSplit #109032

[Coro] Amortize debug info processing cost in CoroSplit #109032

artempyanykh commented Sep 17, 2024

github-actions bot commented Sep 17, 2024 •

edited

Loading

adrian-prantl left a comment

artempyanykh commented Sep 27, 2024 •

edited

Loading

felipepiovezan commented Oct 1, 2024

artempyanykh commented Oct 1, 2024

artempyanykh commented Oct 15, 2024

adrian-prantl left a comment

felipepiovezan commented Oct 17, 2024 •

edited

Loading

felipepiovezan commented Oct 17, 2024

felipepiovezan Oct 17, 2024

felipepiovezan Oct 17, 2024

artempyanykh Oct 18, 2024

felipepiovezan Oct 17, 2024

felipepiovezan Oct 17, 2024

artempyanykh Oct 18, 2024

felipepiovezan Oct 21, 2024 •

edited

Loading

artempyanykh Oct 23, 2024

felipepiovezan Oct 24, 2024 •

edited

Loading

felipepiovezan Oct 24, 2024

artempyanykh Oct 24, 2024

felipepiovezan Oct 30, 2024

artempyanykh commented Oct 18, 2024

[Coro] Amortize debug info processing cost in CoroSplit #109032

Are you sure you want to change the base?

[Coro] Amortize debug info processing cost in CoroSplit #109032

Conversation

artempyanykh commented Sep 17, 2024

github-actions bot commented Sep 17, 2024 • edited Loading

adrian-prantl left a comment

Choose a reason for hiding this comment

artempyanykh commented Sep 27, 2024 • edited Loading

felipepiovezan commented Oct 1, 2024

artempyanykh commented Oct 1, 2024

artempyanykh commented Oct 15, 2024

adrian-prantl left a comment

Choose a reason for hiding this comment

felipepiovezan commented Oct 17, 2024 • edited Loading

felipepiovezan commented Oct 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felipepiovezan Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felipepiovezan Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artempyanykh commented Oct 18, 2024

github-actions bot commented Sep 17, 2024 •

edited

Loading

artempyanykh commented Sep 27, 2024 •

edited

Loading

felipepiovezan commented Oct 17, 2024 •

edited

Loading

felipepiovezan Oct 21, 2024 •

edited

Loading

felipepiovezan Oct 24, 2024 •

edited

Loading