-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SampleFDO] Stale profile call-graph matching #95135
Conversation
The new top-down order stuff is a big change, there are too many old comments on the old PR((#92151), so create a new PR to be easy for the code review. |
assert(FunctionId(IRFunc.getName()) != ProfileFuncName && | ||
"IR function should be different from profile function to match"); | ||
|
||
// Check whether profile function is new. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can a profile function be new?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thanks for caching this, "new" means it doesn't show in the IR, will update the comment.
The change summary is outdated? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked closely. But this does feel cleaner than previous implementation. Thanks for working through this.
std::shared_ptr<ProfileSymbolList> PSL) | ||
: M(M), Reader(Reader), ProbeManager(ProbeManager), LTOPhase(LTOPhase), | ||
SymbolMap(&SymMap), PSL(PSL) {}; | ||
void runOnModule(std::vector<Function *> &OrderedFuncList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably better not to have runOnModule
not depend on a passed in ordered list. Given a module, the top-down order is deterministic. Depending on passed in order can be error prone as it's not guaranteed to be top-down.
We can have SampleProfileMatcher
ctor take CG, and build top down order from within runOnModule
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done.(that needs some refactoring to reuse the code from SampleLoader)
@@ -127,8 +147,37 @@ void SampleProfileMatcher::findProfileAnchors(const FunctionSamples &FS, | |||
} | |||
} | |||
|
|||
LocToLocMap SampleProfileMatcher::longestCommonSequence( | |||
const AnchorList &AnchorList1, const AnchorList &AnchorList2) const { | |||
bool SampleProfileMatcher::isFunctionEqual(const FunctionId &IRFuncName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a good name. should this still be called functionMatchesProfile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to functionMatchesProfile
The summary is updated. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the algorithm need to be iterative? For instance, at top level we have profile function 'foo' calling profile function bar, and IR function 'foo_new' calling IR function 'bar_new'. Can the algorithm recursively match bar with bar_new? If yes, perhaps add a test case.
if (FS) | ||
continue; | ||
|
||
// For extended binary, functions are fully inlined may not be loaded in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove 'are'
continue; | ||
FilteredIRAnchorsList.emplace_back(I); | ||
} | ||
getFilteredAnchorList(IRAnchors, ProfileAnchors, FilteredIRAnchorsList, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible to land refactoring changes first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in #95830.
AnchorList1[X].second == AnchorList2[Y].second) | ||
functionMatchesProfile(AnchorList1[X].second, | ||
AnchorList2[Y].second, | ||
!MatchUnusedFunction)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The negation is slightly confusing. Perhaps add a comment : !MatchUnusedFunction /* matched functions only */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added
// use IR anchor as base(A side) to align with the order of | ||
// IRToProfileLocationMap. | ||
LocToLocMap MatchedAnchors = longestCommonSequence( | ||
FilteredIRAnchorsList, FilteredProfileAnchorList, RunCGMatching); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable naming mismatches: the argument is RunCGMatching, but callee func parameter is 'MatchUnusedFuntion'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the caller RunCGMatching
is to be consistent with RunCFGMatching
, added a comment along with the argument to clarify.
@@ -155,6 +156,21 @@ static inline bool skipProfileForFunction(const Function &F) { | |||
return F.isDeclaration() || !F.hasFnAttribute("use-sample-profile"); | |||
} | |||
|
|||
static inline void | |||
buildTopDownFuncOrder(LazyCallGraph &CG, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function actually yields bottom-up order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the name
std::vector<Function *> SampleProfileMatcher::buildTopDownFuncOrder() { | ||
std::vector<Function *> FunctionOrderList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not very efficient to have a local vector, then return a copy of it, especially given ::buildTopDownFuncOrder
already takes a vector. If we make ::buildTopDownFuncOrder
actually return top-down order instead of bottom-up order, the reverse won't be needed. then this wrapper function won't be necessary at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wrapper function is for sharing the code with the same features in SampleLoader
(see https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1934-L1942). And I searched around, seems the SCC API doesn't support directly top-down order (see https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/LazyCallGraph.h#L973 that there is only postorder_ref_sccs
but no preorder_ref_sccs
things)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we still make a buildTopDownFuncOrder
function and move the reverse
inside that function?
For the other use case, we can hoist the reverse
into the if
.
uint64_t NumRecoveredUnusedSamples = 0; | ||
uint64_t NumRecoveredUnusedFunc = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the names above, how about:
NumCallGraphRecoveredProfiledFunc
NumCallGraphRecoveredFuncSamples
continue; | ||
|
||
LLVM_DEBUG(dbgs() << "Function " << CanonFName | ||
<< " is not in profile or symbol list table.\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
symbol list table
-> profile symbol list
if (ProfileNameToFuncMap.empty()) | ||
return; | ||
for (auto &P : Reader.getProfiles()) | ||
updateProfileWithNewName(P.second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we found a profile name - function name match based on callsite names, but the profile exists but wasn't loaded because with extbinary format we only load profile for functions in the current module by names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not supported yet, I planned to do it in a new PR, this is not a trivial diff as this needs some work for SampleProfileReader, right now the reader don't support to on-demand read profiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a TODO comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO added
} | ||
} | ||
|
||
void SampleProfileMatcher::updateProfilesAndSymbolMap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the last piece of this change that feels not good. Have you tried to intercept all profile look up to consider alternative names? How intrusive that will be? Has the remapper mechanism already identified all the places that we need to intercept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remapper
related code should cover the places to update the new profile name.
Searched around by the keyword lookUpNameInProfile: https://github.com/search?q=repo%3Allvm%2Fllvm-project%20lookUpNameInProfile&type=code
Looks two places are needed to change:
- In SampleProf.cpp, all
findFunctionSamplesAt(...)
functions , lvm/lib/ProfileData/SampleProf.cpp
we need to extend a new parameter to the function, like:
const FunctionSamples *FunctionSamples::findFunctionSamplesAt(
const LineLocation &Loc, StringRef CalleeName,
SampleProfileReaderItaniumRemapper *Remapper) const {...}
To
const FunctionSamples *FunctionSamples::findFunctionSamplesAt(
const LineLocation &Loc, StringRef CalleeName,
SampleProfileReaderItaniumRemapper *Remapper,
HashKeyMap<std::unordered_map, FunctionId, Function *> &FuncToProfileNameMap) const {...}
Then insert the logic if it's not found in the original map, then find in the FuncToProfileNameMap
.
- SampleProfile.cpp is to support the OutlineFunctionSamples find.
So I feels it's not straightforward like the location map(IRToProfileLocationMap
) but it's also not too bad, How do you feel about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually feels simpler than all the current updating logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to look up on-demand in sample load using the matching.
// Check whether profile function appears in IR. | ||
auto F = SymbolMap->find(ProfileFuncName); | ||
if (F != SymbolMap->end()) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is because if the profile already matches a function with the same name, we don't want it to be matched to another function? If so, I understand the intention, but a match is match. Perhaps the logic can be split into isProfileUnused
(hence cannot be matched to another function) + functionMatchesProfile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated to isProfileUnused
, thanks.
// Check whether IR function appears in profile. | ||
auto R = NewIRFunctions.find(IRFuncName); | ||
if (R == NewIRFunctions.end() || !R->second) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as below, a match is a match, perhaps this can be split into functionHasProfile
(hence no need to match) + functionMatchesProfile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is not only to do the check, but also find the function that doesn't have profile("new " function). Maybe change to FindIfFunctionIsNew
? if the return
type is not nullptr, then call functionMatchesProfile
cl::desc("The profile matches the function if their similarity is above " | ||
"the given number(percentage).")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: reword the description: Consider a profile matches a function if the similarity of their callee sequences is above the specified percentile.
// For probe-based function, we first trust the checksum info. If the checksum | ||
// doesn't match, we continue checking for similarity. | ||
if (FunctionSamples::ProfileIsProbeBased) { | ||
const auto *FuncDesc = ProbeManager->getDesc(IRFunc); | ||
if (FuncDesc && | ||
!ProbeManager->profileIsHashMismatched(*FuncDesc, *FSFlattened)) { | ||
LLVM_DEBUG(dbgs() << "The checksums for " << IRFunc.getName() | ||
<< "(IR) and " << ProfFunc << "(Profile) match.\n"); | ||
|
||
return true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be above MinFuncCountForCGMatching
/MinFuncCountForCGMatching
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is intentional, the reason is checksum may not be reliable for tiny function too( I saw some cases the block num is only one but the function are different).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, update the comment to reflect that. The current comment only mentioned similarity check, not checksum.
// Similarity check may not be reliable if the function is tiny
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment updated.
@@ -189,7 +238,9 @@ LocToLocMap SampleProfileMatcher::longestCommonSequence( | |||
X = V[Index(K - 1)] + 1; | |||
Y = X - K; | |||
while (X < Size1 && Y < Size2 && | |||
AnchorList1[X].second == AnchorList2[Y].second) | |||
functionMatchesProfile(AnchorList1[X].second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one problem with this -- we're treating similarity based fuzzy matching the same as the exact name matching, and that could lead to undesired result. Ideally, we should have exact matching (by name) done first, then fuzzy match the rest to produce profile name - function name pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to run the LCS twice? the first time for the exact name matching, and the second time is for the similarity based fuzzy matching based on the first LCS's unmatched results.
Take an extreme example:
IR: New_A. New_B. New_C, D, E
profile: D, E, Unused_A, Unused_B, Unused_C
if we want to matched in two LCSs:
The first LCS's result is [D, E] then the unmatched results is
IR: New_A. New_B. New_C,
profile: Unused_A, Unused_B, Unused_C
and the second LCS result is [A, B, C]. (match. new_XXX. to Unused_XXX)
However, if we apply the matching result,(like we strip the prefix and reran the LCS ), the finial result is actually [A, B, C].
and this is same to the current version, i,e, If we match them together. the LCS is still [A, B, C]
Note that we only match the prefix function to prefix function(New_XXX to Unused_XXX), and match non-prefix to non-prefix function, there is no cross match, cross match is always false here because we check and make sure only match for the profile and function doesn't show in either side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. Sounds good.
It works to handle the recursive function matching. It's not strictly iterative but works like the iterative way by processing the functions in a top-down order and the callee's processing can reuse the caller's matching result. For that example, say the top-down order is The tricky part here is when a new function is matched to a top-level function in the profile, right now the reader doesn't load the matched profile(which is pre-loaded before the matching), we need some work for this in the future. I will add a test for non-top-level profile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Better for wenlen@ to approve the patch.
// Update the data in SampleLoader. | ||
if (SalvageUnusedProfile) | ||
for (auto &I : FuncToProfileNameMap) { | ||
assert(I.first && "New function is null"); | ||
FunctionId FuncName(I.first->getName()); | ||
FuncNameToProfNameMap->emplace(FuncName, I.second); | ||
// We need to remove the old entry to avoid duplicating the function | ||
// processing. | ||
SymbolMap->erase(FuncName); | ||
SymbolMap->emplace(I.second, I.first); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a lot cleaner than updating everything for all function/inlinee samples with name mapping :)
const FunctionId &ProfileFuncName, | ||
bool FindMatchedProfileOnly); | ||
// Determine if the function matches profile by computing a similarity ratio | ||
// between two callsite anchors extracted from function and profile. If it's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: two callsite anchors
-> two sequences of callsite anchors
// If IR function doesn't have profile and the profile is unused, try | ||
// matching them. | ||
Function *IRFunc = findIfFunctionIsNew(IRFuncName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment and the function name findIfFunctionIsNew
do not agree. Here we are indeed checking to see if a function has profile, not whether a function is new.
When we don't have profiled symbol list, findNewIRFunctions
can get us a list of functions without profile only, because we don't know what's new. I'm wondering if we should rename all these things to emphasize on function without profile instead of new?
Also functionHasProfile
and isProfileUnused
seem more symmetrical and easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Will change
findNewIRFunctions
-> findFunctionsWithoutProfile
findIfFunctionIsNew
-> functionHasProfile
Thanks for the suggestion!
@@ -189,7 +238,9 @@ LocToLocMap SampleProfileMatcher::longestCommonSequence( | |||
X = V[Index(K - 1)] + 1; | |||
Y = X - K; | |||
while (X < Size1 && Y < Size2 && | |||
AnchorList1[X].second == AnchorList2[Y].second) | |||
functionMatchesProfile(AnchorList1[X].second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with a nit. Please also make sure compile time is ok with this patch.
LocToLocMap SampleProfileMatcher::longestCommonSequence( | ||
const AnchorList &AnchorList1, const AnchorList &AnchorList2) const { | ||
Function * | ||
SampleProfileMatcher::functionHasProfile(const FunctionId &IRFuncName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: with the name functionHasProfile
, the return should be a boolean, and Function*
can be an out parameter.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/2012 Here is the relevant piece of the build log for the reference:
|
Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the function signature( such as `int foo()` --> `long foo()` ) can make the mangled name different, the function profile(including all nested children profile) becomes unavailable. This patch introduces stale profile call-graph level matching, targeting at identifying the trivial function renaming and reusing the old function profile. Some noteworthy details: 1. Extend the LCS based CFG level matching to identify new function. - Extend to match function and profile have different name instead of the exact function name matching. This leverages LCS, i.e during the finding of callsite anchor matching, when two function name are different, try matching the functions instead of return. - In LCS, the equal function check is replaced by `functionMatchesProfile`. - Only try matching functions that are new functions(neither appears on each side). This reduces the matching scope as we don't need to match the originally matched function. 2. Determine the matching by call-site anchor similarity check. - A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to check the renaming for the possible <IRFunc, ProfFunc> pair, use the LCS(diff) matching to compute the equal set and we define: `Similarity = |equalSet * 2| / (|A| + |B|)`. The profile name is marked as renamed if the similarity is above a threshold(`-func-profile-similarity-threshold`) 3. Process the matching in top-down function order - when a caller's is done matching, the new function names are saved for later use, using top-down order will maximize the reused results. - `ProfileNameToFuncMap` is used to save or cache the matching result. 4. Update the original profile at the end using `ProfileNameToFuncMap`. 5. Added a new switch --salvage-unused-profile to control this, default is false. Verified on one Meta's internal big service, confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function)
Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the function signature( such as `int foo()` --> `long foo()` ) can make the mangled name different, the function profile(including all nested children profile) becomes unavailable. This patch introduces stale profile call-graph level matching, targeting at identifying the trivial function renaming and reusing the old function profile. Some noteworthy details: 1. Extend the LCS based CFG level matching to identify new function. - Extend to match function and profile have different name instead of the exact function name matching. This leverages LCS, i.e during the finding of callsite anchor matching, when two function name are different, try matching the functions instead of return. - In LCS, the equal function check is replaced by `functionMatchesProfile`. - Only try matching functions that are new functions(neither appears on each side). This reduces the matching scope as we don't need to match the originally matched function. 2. Determine the matching by call-site anchor similarity check. - A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to check the renaming for the possible <IRFunc, ProfFunc> pair, use the LCS(diff) matching to compute the equal set and we define: `Similarity = |equalSet * 2| / (|A| + |B|)`. The profile name is marked as renamed if the similarity is above a threshold(`-func-profile-similarity-threshold`) 3. Process the matching in top-down function order - when a caller's is done matching, the new function names are saved for later use, using top-down order will maximize the reused results. - `ProfileNameToFuncMap` is used to save or cache the matching result. 4. Update the original profile at the end using `ProfileNameToFuncMap`. 5. Added a new switch --salvage-unused-profile to control this, default is false. Verified on one Meta's internal big service, confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function)
Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the function signature( such as
int foo()
-->long foo()
) can make the mangled name different, the function profile(including all nested children profile) becomes unavailable.This patch introduces stale profile call-graph level matching, targeting at identifying the trivial function renaming and reusing the old function profile.
Some noteworthy details:
Extend the LCS based CFG level matching to identify new function.
functionMatchesProfile
.Determine the matching by call-site anchor similarity check.
functionMatchesProfile(IRFunc, ProfFunc)
is used to check the renaming for the possible <IRFunc, ProfFunc> pair, use the LCS(diff) matching to compute the equal set and we define:Similarity = |equalSet * 2| / (|A| + |B|)
. The profile name is marked as renamed if the similarity is above a threshold(-func-profile-similarity-threshold
)Process the matching in top-down function order
ProfileNameToFuncMap
is used to save or cache the matching result.Update the original profile at the end using
ProfileNameToFuncMap
.Added a new switch --salvage-unused-profile to control this, default is false.
Verified on one Meta's internal big service, confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function)