Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SampleFDO] Stale profile call-graph matching #95135

Merged
merged 25 commits into from
Jul 17, 2024

Conversation

wlei-llvm
Copy link
Contributor

@wlei-llvm wlei-llvm commented Jun 11, 2024

Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the function signature( such as int foo() --> long foo() ) can make the mangled name different, the function profile(including all nested children profile) becomes unavailable.

This patch introduces stale profile call-graph level matching, targeting at identifying the trivial function renaming and reusing the old function profile.

Some noteworthy details:

  1. Extend the LCS based CFG level matching to identify new function.

    • Extend to match function and profile have different name instead of the exact function name matching. This leverages LCS, i.e during the finding of callsite anchor matching, when two function name are different, try matching the functions instead of return.
    • In LCS, the equal function check is replaced by functionMatchesProfile.
    • Only try matching functions that are new functions(neither appears on each side). This reduces the matching scope as we don't need to match the originally matched function.
  2. Determine the matching by call-site anchor similarity check.

    • A new function functionMatchesProfile(IRFunc, ProfFunc) is used to check the renaming for the possible <IRFunc, ProfFunc> pair, use the LCS(diff) matching to compute the equal set and we define: Similarity = |equalSet * 2| / (|A| + |B|). The profile name is marked as renamed if the similarity is above a threshold(-func-profile-similarity-threshold)
  3. Process the matching in top-down function order

    • when a caller's is done matching, the new function names are saved for later use, using top-down order will maximize the reused results.
    • ProfileNameToFuncMap is used to save or cache the matching result.
  4. Update the original profile at the end using ProfileNameToFuncMap.

  5. Added a new switch --salvage-unused-profile to control this, default is false.

Verified on one Meta's internal big service, confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function)

@wlei-llvm wlei-llvm marked this pull request as ready for review June 11, 2024 16:21
@wlei-llvm
Copy link
Contributor Author

The new top-down order stuff is a big change, there are too many old comments on the old PR((#92151), so create a new PR to be easy for the code review.

assert(FunctionId(IRFunc.getName()) != ProfileFuncName &&
"IR function should be different from profile function to match");

// Check whether profile function is new.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can a profile function be new?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks for caching this, "new" means it doesn't show in the IR, will update the comment.

@WenleiHe
Copy link
Member

The change summary is outdated?

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked closely. But this does feel cleaner than previous implementation. Thanks for working through this.

std::shared_ptr<ProfileSymbolList> PSL)
: M(M), Reader(Reader), ProbeManager(ProbeManager), LTOPhase(LTOPhase),
SymbolMap(&SymMap), PSL(PSL) {};
void runOnModule(std::vector<Function *> &OrderedFuncList);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably better not to have runOnModule not depend on a passed in ordered list. Given a module, the top-down order is deterministic. Depending on passed in order can be error prone as it's not guaranteed to be top-down.

We can have SampleProfileMatcher ctor take CG, and build top down order from within runOnModule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done.(that needs some refactoring to reuse the code from SampleLoader)

@@ -127,8 +147,37 @@ void SampleProfileMatcher::findProfileAnchors(const FunctionSamples &FS,
}
}

LocToLocMap SampleProfileMatcher::longestCommonSequence(
const AnchorList &AnchorList1, const AnchorList &AnchorList2) const {
bool SampleProfileMatcher::isFunctionEqual(const FunctionId &IRFuncName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a good name. should this still be called functionMatchesProfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to functionMatchesProfile

@wlei-llvm
Copy link
Contributor Author

The change summary is outdated?

The summary is updated.

Copy link

github-actions bot commented Jun 15, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@david-xl david-xl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the algorithm need to be iterative? For instance, at top level we have profile function 'foo' calling profile function bar, and IR function 'foo_new' calling IR function 'bar_new'. Can the algorithm recursively match bar with bar_new? If yes, perhaps add a test case.

if (FS)
continue;

// For extended binary, functions are fully inlined may not be loaded in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove 'are'

continue;
FilteredIRAnchorsList.emplace_back(I);
}
getFilteredAnchorList(IRAnchors, ProfileAnchors, FilteredIRAnchorsList,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible to land refactoring changes first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #95830.

AnchorList1[X].second == AnchorList2[Y].second)
functionMatchesProfile(AnchorList1[X].second,
AnchorList2[Y].second,
!MatchUnusedFunction))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negation is slightly confusing. Perhaps add a comment : !MatchUnusedFunction /* matched functions only */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added

// use IR anchor as base(A side) to align with the order of
// IRToProfileLocationMap.
LocToLocMap MatchedAnchors = longestCommonSequence(
FilteredIRAnchorsList, FilteredProfileAnchorList, RunCGMatching);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable naming mismatches: the argument is RunCGMatching, but callee func parameter is 'MatchUnusedFuntion'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the caller RunCGMatching is to be consistent with RunCFGMatching, added a comment along with the argument to clarify.

@@ -155,6 +156,21 @@ static inline bool skipProfileForFunction(const Function &F) {
return F.isDeclaration() || !F.hasFnAttribute("use-sample-profile");
}

static inline void
buildTopDownFuncOrder(LazyCallGraph &CG,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function actually yields bottom-up order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the name

Comment on lines 890 to 891
std::vector<Function *> SampleProfileMatcher::buildTopDownFuncOrder() {
std::vector<Function *> FunctionOrderList;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very efficient to have a local vector, then return a copy of it, especially given ::buildTopDownFuncOrder already takes a vector. If we make ::buildTopDownFuncOrder actually return top-down order instead of bottom-up order, the reverse won't be needed. then this wrapper function won't be necessary at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper function is for sharing the code with the same features in SampleLoader(see https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1934-L1942). And I searched around, seems the SCC API doesn't support directly top-down order (see https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/LazyCallGraph.h#L973 that there is only postorder_ref_sccs but no preorder_ref_sccs things)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still make a buildTopDownFuncOrder function and move the reverse inside that function?

For the other use case, we can hoist the reverse into the if.

Comment on lines 104 to 105
uint64_t NumRecoveredUnusedSamples = 0;
uint64_t NumRecoveredUnusedFunc = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the names above, how about:

NumCallGraphRecoveredProfiledFunc
NumCallGraphRecoveredFuncSamples

continue;

LLVM_DEBUG(dbgs() << "Function " << CanonFName
<< " is not in profile or symbol list table.\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

symbol list table -> profile symbol list

if (ProfileNameToFuncMap.empty())
return;
for (auto &P : Reader.getProfiles())
updateProfileWithNewName(P.second);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we found a profile name - function name match based on callsite names, but the profile exists but wasn't loaded because with extbinary format we only load profile for functions in the current module by names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not supported yet, I planned to do it in a new PR, this is not a trivial diff as this needs some work for SampleProfileReader, right now the reader don't support to on-demand read profiles

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO added

}
}

void SampleProfileMatcher::updateProfilesAndSymbolMap() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the last piece of this change that feels not good. Have you tried to intercept all profile look up to consider alternative names? How intrusive that will be? Has the remapper mechanism already identified all the places that we need to intercept?

Copy link
Contributor Author

@wlei-llvm wlei-llvm Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remapper related code should cover the places to update the new profile name.

Searched around by the keyword lookUpNameInProfile: https://github.com/search?q=repo%3Allvm%2Fllvm-project%20lookUpNameInProfile&type=code

Looks two places are needed to change:

  1. In SampleProf.cpp, all findFunctionSamplesAt(...) functions , lvm/lib/ProfileData/SampleProf.cpp

we need to extend a new parameter to the function, like:

const FunctionSamples *FunctionSamples::findFunctionSamplesAt(
    const LineLocation &Loc, StringRef CalleeName,
    SampleProfileReaderItaniumRemapper *Remapper) const {...}

To

const FunctionSamples *FunctionSamples::findFunctionSamplesAt(
    const LineLocation &Loc, StringRef CalleeName,
    SampleProfileReaderItaniumRemapper *Remapper, 
      HashKeyMap<std::unordered_map, FunctionId, Function *> &FuncToProfileNameMap) const {...}

Then insert the logic if it's not found in the original map, then find in the FuncToProfileNameMap.

  1. SampleProfile.cpp is to support the OutlineFunctionSamples find.

So I feels it's not straightforward like the location map(IRToProfileLocationMap) but it's also not too bad, How do you feel about that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually feels simpler than all the current updating logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to look up on-demand in sample load using the matching.

Comment on lines 168 to 171
// Check whether profile function appears in IR.
auto F = SymbolMap->find(ProfileFuncName);
if (F != SymbolMap->end())
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is because if the profile already matches a function with the same name, we don't want it to be matched to another function? If so, I understand the intention, but a match is match. Perhaps the logic can be split into isProfileUnused(hence cannot be matched to another function) + functionMatchesProfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated to isProfileUnused, thanks.

Comment on lines 160 to 163
// Check whether IR function appears in profile.
auto R = NewIRFunctions.find(IRFuncName);
if (R == NewIRFunctions.end() || !R->second)
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as below, a match is a match, perhaps this can be split into functionHasProfile (hence no need to match) + functionMatchesProfile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is not only to do the check, but also find the function that doesn't have profile("new " function). Maybe change to FindIfFunctionIsNew? if the return type is not nullptr, then call functionMatchesProfile

Comment on lines 30 to 31
cl::desc("The profile matches the function if their similarity is above "
"the given number(percentage)."));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reword the description: Consider a profile matches a function if the similarity of their callee sequences is above the specified percentile.

Comment on lines +744 to +755
// For probe-based function, we first trust the checksum info. If the checksum
// doesn't match, we continue checking for similarity.
if (FunctionSamples::ProfileIsProbeBased) {
const auto *FuncDesc = ProbeManager->getDesc(IRFunc);
if (FuncDesc &&
!ProbeManager->profileIsHashMismatched(*FuncDesc, *FSFlattened)) {
LLVM_DEBUG(dbgs() << "The checksums for " << IRFunc.getName()
<< "(IR) and " << ProfFunc << "(Profile) match.\n");

return true;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be above MinFuncCountForCGMatching/MinFuncCountForCGMatching check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is intentional, the reason is checksum may not be reliable for tiny function too( I saw some cases the block num is only one but the function are different).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, update the comment to reflect that. The current comment only mentioned similarity check, not checksum.

// Similarity check may not be reliable if the function is tiny

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment updated.

@@ -189,7 +238,9 @@ LocToLocMap SampleProfileMatcher::longestCommonSequence(
X = V[Index(K - 1)] + 1;
Y = X - K;
while (X < Size1 && Y < Size2 &&
AnchorList1[X].second == AnchorList2[Y].second)
functionMatchesProfile(AnchorList1[X].second,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one problem with this -- we're treating similarity based fuzzy matching the same as the exact name matching, and that could lead to undesired result. Ideally, we should have exact matching (by name) done first, then fuzzy match the rest to produce profile name - function name pair.

Copy link
Contributor Author

@wlei-llvm wlei-llvm Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to run the LCS twice? the first time for the exact name matching, and the second time is for the similarity based fuzzy matching based on the first LCS's unmatched results.
Take an extreme example:

IR:         New_A.    New_B.   New_C,     D,   E 
profile:    D,   E,  Unused_A,  Unused_B,  Unused_C

if we want to matched in two LCSs:
The first LCS's result is [D, E] then the unmatched results is

IR:         New_A.    New_B.   New_C,  
profile:  Unused_A,  Unused_B,  Unused_C

and the second LCS result is [A, B, C]. (match. new_XXX. to Unused_XXX)

However, if we apply the matching result,(like we strip the prefix and reran the LCS ), the finial result is actually [A, B, C].

and this is same to the current version, i,e, If we match them together. the LCS is still [A, B, C]

Note that we only match the prefix function to prefix function(New_XXX to Unused_XXX), and match non-prefix to non-prefix function, there is no cross match, cross match is always false here because we check and make sure only match for the profile and function doesn't show in either side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. Sounds good.

@wlei-llvm
Copy link
Contributor Author

wlei-llvm commented Jun 18, 2024

Does the algorithm need to be iterative? For instance, at top level we have profile function 'foo' calling profile function bar, and IR function 'foo_new' calling IR function 'bar_new'. Can the algorithm recursively match bar with bar_new? If yes, perhaps add a test case.

It works to handle the recursive function matching. It's not strictly iterative but works like the iterative way by processing the functions in a top-down order and the callee's processing can reuse the caller's matching result. For that example, say the top-down order is
main --> foo_new -->bar_new
It first process main, and run matching on main's callee sequences(which includes foo_new), it identifies foo_new --> foo, Note that once main's process is done, it will return and doesn't recursively process callee in main's process.
Then it start a new instance to process foo_new, which is found to match foo and then it run on its callee sequences matching to identify bar_new --> bar

The tricky part here is when a new function is matched to a top-level function in the profile, right now the reader doesn't load the matched profile(which is pre-loaded before the matching), we need some work for this in the future. I will add a test for non-top-level profile.

Copy link
Contributor

@david-xl david-xl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Better for wenlen@ to approve the patch.

Comment on lines +865 to +875
// Update the data in SampleLoader.
if (SalvageUnusedProfile)
for (auto &I : FuncToProfileNameMap) {
assert(I.first && "New function is null");
FunctionId FuncName(I.first->getName());
FuncNameToProfNameMap->emplace(FuncName, I.second);
// We need to remove the old entry to avoid duplicating the function
// processing.
SymbolMap->erase(FuncName);
SymbolMap->emplace(I.second, I.first);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a lot cleaner than updating everything for all function/inlinee samples with name mapping :)

const FunctionId &ProfileFuncName,
bool FindMatchedProfileOnly);
// Determine if the function matches profile by computing a similarity ratio
// between two callsite anchors extracted from function and profile. If it's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: two callsite anchors -> two sequences of callsite anchors

Comment on lines 167 to 169
// If IR function doesn't have profile and the profile is unused, try
// matching them.
Function *IRFunc = findIfFunctionIsNew(IRFuncName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment and the function name findIfFunctionIsNew do not agree. Here we are indeed checking to see if a function has profile, not whether a function is new.

When we don't have profiled symbol list, findNewIRFunctions can get us a list of functions without profile only, because we don't know what's new. I'm wondering if we should rename all these things to emphasize on function without profile instead of new?

Also functionHasProfile and isProfileUnused seem more symmetrical and easier to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Will change
findNewIRFunctions -> findFunctionsWithoutProfile
findIfFunctionIsNew -> functionHasProfile
Thanks for the suggestion!

@@ -189,7 +238,9 @@ LocToLocMap SampleProfileMatcher::longestCommonSequence(
X = V[Index(K - 1)] + 1;
Y = X - K;
while (X < Size1 && Y < Size2 &&
AnchorList1[X].second == AnchorList2[Y].second)
functionMatchesProfile(AnchorList1[X].second,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. Sounds good.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with a nit. Please also make sure compile time is ok with this patch.

LocToLocMap SampleProfileMatcher::longestCommonSequence(
const AnchorList &AnchorList1, const AnchorList &AnchorList2) const {
Function *
SampleProfileMatcher::functionHasProfile(const FunctionId &IRFuncName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: with the name functionHasProfile, the return should be a boolean, and Function* can be an out parameter.

@wlei-llvm wlei-llvm merged commit 18cdfa7 into llvm:main Jul 17, 2024
5 of 6 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 17, 2024

LLVM Buildbot has detected a new failure on builder clang-hip-vega20 running on hip-vega20-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/2012

Here is the relevant piece of the build log for the reference:

Step 3 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-build.sh --jobs=' (failure)
...
[36/38] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/InOneWeekend-hip-6.0.2.dir/workload/ray-tracing/InOneWeekend/main.cc.o -o External/HIP/InOneWeekend-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/InOneWeekend.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend.reference_output-hip-6.0.2
[37/38] /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG  -O3 -DNDEBUG   -w -Werror=date-time --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /buildbot/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
[38/38] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -o External/HIP/TheNextWeek-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/TheNextWeek.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/TheNextWeek.reference_output-hip-6.0.2
+ build_step 'Testing HIP test-suite'
+ echo '@@@BUILD_STEP Testing HIP test-suite@@@'
@@@BUILD_STEP Testing HIP test-suite@@@
+ ninja -v check-hip-simple
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 6 tests, 6 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 6)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 392.57s

Total Discovered Tests: 6
  Passed: 5 (83.33%)
  Failed: 1 (16.67%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
Step 12 (Testing HIP test-suite) failure: Testing HIP test-suite (failure)
@@@BUILD_STEP Testing HIP test-suite@@@
+ ninja -v check-hip-simple
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 6 tests, 6 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 6)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 392.57s

Total Discovered Tests: 6
  Passed: 5 (83.33%)
  Failed: 1 (16.67%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
program finished with exit code 1
elapsedTime=538.866704

sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
Profile staleness could be due to function renaming. Given that sample
profile loader relies on exact string matching, a trivial change in the
function signature( such as `int foo()` --> `long foo()` ) can make the
mangled name different, the function profile(including all nested
children profile) becomes unavailable.

This patch introduces stale profile call-graph level matching, targeting
at identifying the trivial function renaming and reusing the old
function profile.

Some noteworthy details:

1. Extend the LCS based CFG level matching to identify new function. 
- Extend to match function and profile have different name instead of
the exact function name matching. This leverages LCS, i.e during the
finding of callsite anchor matching, when two function name are
different, try matching the functions instead of return.
- In LCS, the equal function check is replaced by
`functionMatchesProfile`.
- Only try matching functions that are new functions(neither appears on
each side). This reduces the matching scope as we don't need to match
the originally matched function.
2.  Determine the matching by call-site anchor similarity check.
- A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to
check the renaming for the possible <IRFunc, ProfFunc> pair, use the
LCS(diff) matching to compute the equal set and we define: `Similarity =
|equalSet * 2| / (|A| + |B|)`. The profile name is marked as renamed if
the similarity is above a
threshold(`-func-profile-similarity-threshold`)

3.  Process the matching in top-down function order 
- when a caller's is done matching, the new function names are saved for
later use, using top-down order will maximize the reused results.
- `ProfileNameToFuncMap` is used to save or cache the matching result.
4. Update the original profile at the end using `ProfileNameToFuncMap`.

5. Added a new switch --salvage-unused-profile to control this, default
is false.

Verified on one Meta's internal big service, confirmed 90%+ of the found
renaming pair is good. (There could be incorrect renaming pair if the
num of the anchor is small, but checked that those functions are simple
cold function)
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Profile staleness could be due to function renaming. Given that sample
profile loader relies on exact string matching, a trivial change in the
function signature( such as `int foo()` --> `long foo()` ) can make the
mangled name different, the function profile(including all nested
children profile) becomes unavailable.

This patch introduces stale profile call-graph level matching, targeting
at identifying the trivial function renaming and reusing the old
function profile.

Some noteworthy details:

1. Extend the LCS based CFG level matching to identify new function. 
- Extend to match function and profile have different name instead of
the exact function name matching. This leverages LCS, i.e during the
finding of callsite anchor matching, when two function name are
different, try matching the functions instead of return.
- In LCS, the equal function check is replaced by
`functionMatchesProfile`.
- Only try matching functions that are new functions(neither appears on
each side). This reduces the matching scope as we don't need to match
the originally matched function.
2.  Determine the matching by call-site anchor similarity check.
- A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to
check the renaming for the possible <IRFunc, ProfFunc> pair, use the
LCS(diff) matching to compute the equal set and we define: `Similarity =
|equalSet * 2| / (|A| + |B|)`. The profile name is marked as renamed if
the similarity is above a
threshold(`-func-profile-similarity-threshold`)

3.  Process the matching in top-down function order 
- when a caller's is done matching, the new function names are saved for
later use, using top-down order will maximize the reused results.
- `ProfileNameToFuncMap` is used to save or cache the matching result.
4. Update the original profile at the end using `ProfileNameToFuncMap`.

5. Added a new switch --salvage-unused-profile to control this, default
is false.

Verified on one Meta's internal big service, confirmed 90%+ of the found
renaming pair is good. (There could be incorrect renaming pair if the
num of the anchor is small, but checked that those functions are simple
cold function)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants