-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Coro] Amortize debug info processing cost in CoroSplit #109032
base: main
Are you sure you want to change the base?
Conversation
Summary: A helper (2 overloads) that consolidates corocloner creation and the actual cloning. The helpers create a TimeTraceScope to make it easier to see how long the cloning takes. Test Plan: ninja check-llvm-unit
Summary: Consolidate the logic in a single function. We do an extra pass over Instructions but this is necessary to untangle things and extract metadata cloning in a future diff. Test Plan: ninja check-llvm-unit
Summary: There was a single usage of CloneBasicBlock with non-default DebugInfoFinder inside CloneFunctionInto which has been refactored in the previous diff. Let's remove the parameter to keep the scope of the function more focused. Test Plan: ninja check-llvm-unit
…neFunctionInto Summary: This makes the flow of the function a bit more straightforward and makes it easier to extract more into helper functions. Test Plan: ninja check-llvm-unit
Summary: Extract the logic to build up a metadta map to use in metadata cloning into a separate function. Test Plan: ninja check-llvm-unit
Summary: One potentially contentious point in the new function's API is that it expects the caller ot populate the VMap rather than doing metadata cloning wholesale inside it. We'll need it this way for a subsequent change. Test Plan: ninja check-llvm-unit
Summary: This and previously extracted functions will be used in a later diff. Test Plan: ninja check-llvm-unit
Summary: Previously, we'd add all SPs distinct from the cloned one into a set. Then when cloning a local scope we'd check if it's from one of those 'distinct' SPs by checking if it's in the set. We don't need to do that. We can just check against the cloned SP directly and drop the set. Test Plan: ninja check-llvm-unit
Summary: The typedef was there probably because the type alias in ValueMap was private. Test Plan: ninja check-llvm-unit
Summary: To avoid cloning 'global' debug info, CloneFunction implementation used to eagerly identity map a known subset of global debug into into ValueMap's MD map. In larger modules with meaningful volume of debug info this gets very expensive. By passing such global metadata via an IdentityMD set for the ValueMapper to map on first use, we get several benefits: 1. Mapping metadata is not cheap, particularly because of tracking. When cloning a Function we identity map lots of global module-level metadata to avoid cloning it, while only a fraction of it is actually used by the function. Mapping on first use is a lot faster for modules with meaningful amount of debug info. 2. Eagerly identity mapping metadata makes it harder to cache module-level data (e.g. a set of metadata nodes in a \a DICompileUnit). With this patch we can cache certain module-level metadata calculations to speed things up further. Anecdata from compiling a sample cpp file with full debug info shows that this moderately speeds up CoroSplitPass which is one of the heavier users of cloning: | | Baseline | IdentityMD set | |-----------------+----------+----------------| | CoroSplitPass | 306ms | 221ms | | CoroCloner | 101ms | 72ms | |-----------------+----------+----------------| | Speed up | 1x | 1.4x | Test Plan: ninja check-llvm-unit ninja check-llvm
…utine clones Summary: CoroCloner, by calling into CloneFunctionInto, does a lot of repeated work priming DIFinder and building a list of global debug info metadata. For programs compiled with full debug info this gets very expensive. This diff builds the data once and shares it between all clones. Anecdata for a sample cpp source file compiled with full debug info: | | Baseline | IdentityMD set | Prebuilt GlobalDI (cur.) | |-----------------+----------+----------------+--------------------------| | CoroSplitPass | 306ms | 221ms | 68ms | | CoroCloner | 101ms | 72ms | 0.5ms | | CollectGlobalDI | - | - | 63ms | |-----------------+----------+----------------+--------------------------| | Speed up | 1x | 1.4x | 4.5x | Note that CollectGlobalDI happens once *per coroutine* rather than per clone. Test Plan: ninja check-llvm-unit ninja check-llvm Compiled a sample internal source file, checked time trace output for scope timings.
Summary: The analysis simply primes and caches DebugInfoFinders for each DICompileUnit in a module. This allows (future) callers like CoroSplitPass to compute global debug info metadata (required for coroutine function cloning) much faster. Specifically, pay the price of DICompileUnit processing only once per compile unit, rather than once per coroutine. Test Plan: Added a smoke test for the new analysis ninja check-llvm-unit
Summary: We can use a DebugInfoFinder from DebugInfoCache which is already primed on a compile unit to speed up collection of global debug info. The pass could likely be another 2x+ faster if we avoid rebuilding the set of global debug info. This needs further massaging of CloneFunction and ValueMapper, though, and can be done incrementally on top of this. Comparing performance of CoroSplitPass at various points in this stack, this is anecdata from a sample cpp file compiled with full debug info: | | Baseline | IdentityMD set | Prebuilt GlobalDI | Cached CU DIFinder (cur.) | |-----------------+----------+----------------+-------------------+---------------------------| | CoroSplitPass | 306ms | 221ms | 68ms | 17ms | | CoroCloner | 101ms | 72ms | 0.5ms | 0.5ms | | CollectGlobalDI | - | - | 63ms | 13ms | |-----------------+----------+----------------+-------------------+---------------------------| | Speed up | 1x | 1.4x | 4.5x | 18x | Test Plan: ninja check-llvm-unit ninja check-llvm Compiled a sample cpp file with time trace to get the avg. duration of the pass and inner scopes.
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like generally a good idea and the implementation looks clean as well.
What are the behavior changes we expect from this patch series?
Thanks for reviewing @adrian-prantl!
|
Thanks for working on this and for making each commit reviewable / commitable on its own, that's really great work! |
Thanks @felipepiovezan! There's some commentary around the commits in this discourse topic, in case it's useful. |
@adrian-prantl, @felipepiovezan have you had a chance to take a closer look? Could you advise on what would be the best way to proceed with reviewing this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felipepiovezan Can you apply this patch to the LLVM in swiftlang and run the tests with it as a sanity check?
Hi @artempyanykh, while I follow Adrian's suggestions, you could extract the first commit into a separate PR and we'll start reviewing/merging the NFC stuff one by one |
The second commit can also be a PR on its own right now, it doesn't depend on the first one AFAICT, so we can review them in parallel |
|
||
// Copy all attributes other than those stored in the AttributeList. We need | ||
// to remap the parameter indices of the AttributeList. | ||
// Copy all attributes other than those stored in the AttributeList. We need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this comment is making a lot of sense as a function-level comment. To be honest even the original comment was confusing: which AttributeList? If you know the answer to this, could you take the chance provided by this NFC commit to improve the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This second commit could also use a brief commit message describing why this change is useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment refers to the AttributeList
field (the only such) of a Function which has e.g. parameters and return value attributes. I updated the comment in #112976 to (hopefully) make it a bit clearer.
DISubprogram *SPClonedWithinModule = nullptr; | ||
if (Changes < CloneFunctionChangeType::DifferentModule) { | ||
SPClonedWithinModule = F.getSubprogram(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM's coding guidelines prescribe not using {}
around single line blocks
@@ -205,6 +205,11 @@ void CloneAndPruneFunctionInto(Function *NewFunc, const Function *OldFunc, | |||
const char *NameSuffix = "", | |||
ClonedCodeInfo *CodeInfo = nullptr); | |||
|
|||
/// Process debug information from function's subprogram attachment. | |||
DISubprogram *ProcessSubprogramAttachment(const Function &F, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on "Process"? This is more about collecting debug info, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. This function hydrates the passed DebugInfoFinder by visiting relevant components of the function's subprogram attachment. I'll update the name/comment when we get to extracting that commit in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, now that the other two PRs are open & approved, let's proceed with this commit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, yes! @felipepiovezan a silly question -- now that those PRs are approved, how/when do they get merged? Stacking unmerged PRs on top of each other doesn't have the best UX in Github.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the best option here is to merge the two PRs that are approved (#112976 and #112948) . Both of those stand on their own and don't depend on each other, so it's fine to merge in w/e order, right?
And then you can grab the third commit (or more, if the subsequent commits are also independent of each other) of this PR and open a new PR (or multiple PRs, if you did the () before), so it will look fine on top of main
. My understanding is that the fourth commit depends on the third though.
Regarding what to do about this umbrella PR in the mean time, you have two options: leave this as is (with the understanding that it is not meant to be merged, merely serve as a reference for all the work in the proposal), or rebase on top of main, which will get rid of the two-already merged commits and force push to your branch artempyanykh:fast-coro-upstream
, which will mean this PR serves a reference of the work that's left to be merged, also with the understand that we're not merging it.
With all that stack, we'd never have to stack unmerged PRs on top of each other, does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, do you have commit access?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the best option here is to merge the two PRs that are approved (#112976 and #112948) . Both of those stand on their own and don't depend on each other, so it's fine to merge in w/e order, right?
That's right, yes.
And then you can grab the third commit (or more, if the subsequent commits are also independent of each other) of this PR and open a new PR (or multiple PRs, if you did the () before), so it will look fine on top of main. My understanding is that the fourth commit depends on the third though.
After the first 2 are merged, extracting the third one in a separate PR is straightforward. With the rest of the stack there's quite a bit of sequencing, so we may need to rinse and repeat.
rebase on top of main, which will get rid of the two-already merged commits and force push to your branch artempyanykh:fast-coro-upstream, which will mean this PR serves a reference of the work that's left to be merged, also with the understand that we're not merging it
+1 to this. I was planning to keep rebasing this stack anyway to make sure things keep working end-to-end while the chunks are being extracted and merged.
By the way, do you have commit access?
No, not sure what the process for this is. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, the LLVM dev conference happened last week and we were mostly out.
No, not sure what the process for this is. Why?
Just wondering if you needed help merging the open PRs. I've merged the first two for you.
The process for getting commit access is described here: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access
If you don't want to do that, you can always let the reviewers know you need someone to press the merge button for you
@felipepiovezan I extracted the first 2 commits into #112948 and #112976. |
A helper (2 overloads) that consolidates corocloner creation and the actual cloning. The helpers create a TimeTraceScope to make it easier to see how long the cloning takes. Extracted from #109032 (commit 1)
More details about this stack are in this topic on discourse.
TL;DR: In large modules with a bunch of coroutines the cost of debug info metadata processing can get very high (e.g. adding over a minute to the compile time). This stack does some refactoring around function cloning and caches some module-level calculations to speed up the pass.
Anecdata for a sample C++ source file:
Each commit is individually buildable and reviewable. I can extract them into separate PRs if the overall direction makes sense.