Skip to content

Commit

Permalink
[SampleFDO] Stale profile call-graph matching (#95135)
Browse files Browse the repository at this point in the history
Profile staleness could be due to function renaming. Given that sample
profile loader relies on exact string matching, a trivial change in the
function signature( such as `int foo()` --> `long foo()` ) can make the
mangled name different, the function profile(including all nested
children profile) becomes unavailable.

This patch introduces stale profile call-graph level matching, targeting
at identifying the trivial function renaming and reusing the old
function profile.

Some noteworthy details:

1. Extend the LCS based CFG level matching to identify new function. 
- Extend to match function and profile have different name instead of
the exact function name matching. This leverages LCS, i.e during the
finding of callsite anchor matching, when two function name are
different, try matching the functions instead of return.
- In LCS, the equal function check is replaced by
`functionMatchesProfile`.
- Only try matching functions that are new functions(neither appears on
each side). This reduces the matching scope as we don't need to match
the originally matched function.
2.  Determine the matching by call-site anchor similarity check.
- A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to
check the renaming for the possible <IRFunc, ProfFunc> pair, use the
LCS(diff) matching to compute the equal set and we define: `Similarity =
|equalSet * 2| / (|A| + |B|)`. The profile name is marked as renamed if
the similarity is above a
threshold(`-func-profile-similarity-threshold`)

3.  Process the matching in top-down function order 
- when a caller's is done matching, the new function names are saved for
later use, using top-down order will maximize the reused results.
- `ProfileNameToFuncMap` is used to save or cache the matching result.
4. Update the original profile at the end using `ProfileNameToFuncMap`.

5. Added a new switch --salvage-unused-profile to control this, default
is false.

Verified on one Meta's internal big service, confirmed 90%+ of the found
renaming pair is good. (There could be incorrect renaming pair if the
num of the anchor is small, but checked that those functions are simple
cold function)
  • Loading branch information
wlei-llvm authored Jul 17, 2024
1 parent 81955da commit 18cdfa7
Show file tree
Hide file tree
Showing 14 changed files with 1,064 additions and 127 deletions.
23 changes: 14 additions & 9 deletions llvm/include/llvm/ProfileData/SampleProf.h
Original file line number Diff line number Diff line change
Expand Up @@ -919,12 +919,14 @@ class FunctionSamples {
/// Returns a pointer to FunctionSamples at the given callsite location
/// \p Loc with callee \p CalleeName. If no callsite can be found, relax
/// the restriction to return the FunctionSamples at callsite location
/// \p Loc with the maximum total sample count. If \p Remapper is not
/// nullptr, use \p Remapper to find FunctionSamples with equivalent name
/// as \p CalleeName.
const FunctionSamples *
findFunctionSamplesAt(const LineLocation &Loc, StringRef CalleeName,
SampleProfileReaderItaniumRemapper *Remapper) const;
/// \p Loc with the maximum total sample count. If \p Remapper or \p
/// FuncNameToProfNameMap is not nullptr, use them to find FunctionSamples
/// with equivalent name as \p CalleeName.
const FunctionSamples *findFunctionSamplesAt(
const LineLocation &Loc, StringRef CalleeName,
SampleProfileReaderItaniumRemapper *Remapper,
const HashKeyMap<std::unordered_map, FunctionId, FunctionId>
*FuncNameToProfNameMap = nullptr) const;

bool empty() const { return TotalSamples == 0; }

Expand Down Expand Up @@ -1172,11 +1174,14 @@ class FunctionSamples {
/// tree nodes in the profile.
///
/// \returns the FunctionSamples pointer to the inlined instance.
/// If \p Remapper is not nullptr, it will be used to find matching
/// FunctionSamples with not exactly the same but equivalent name.
/// If \p Remapper or \p FuncNameToProfNameMap is not nullptr, it will be used
/// to find matching FunctionSamples with not exactly the same but equivalent
/// name.
const FunctionSamples *findFunctionSamples(
const DILocation *DIL,
SampleProfileReaderItaniumRemapper *Remapper = nullptr) const;
SampleProfileReaderItaniumRemapper *Remapper = nullptr,
const HashKeyMap<std::unordered_map, FunctionId, FunctionId>
*FuncNameToProfNameMap = nullptr) const;

static bool ProfileIsProbeBased;

Expand Down
115 changes: 101 additions & 14 deletions llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ using AnchorMap = std::map<LineLocation, FunctionId>;
class SampleProfileMatcher {
Module &M;
SampleProfileReader &Reader;
LazyCallGraph &CG;
const PseudoProbeManager *ProbeManager;
const ThinOrFullLTOPhase LTOPhase;
SampleProfileMap FlattenedProfiles;
Expand Down Expand Up @@ -58,6 +59,40 @@ class SampleProfileMatcher {
StringMap<std::unordered_map<LineLocation, MatchState, LineLocationHash>>
FuncCallsiteMatchStates;

struct FuncToProfileNameMapHash {
uint64_t
operator()(const std::pair<const Function *, FunctionId> &P) const {
return hash_combine(P.first, P.second);
}
};
// A map from a pair of function and profile name to a boolean value
// indicating whether they are matched. This is used as a cache for the
// matching result.
std::unordered_map<std::pair<const Function *, FunctionId>, bool,
FuncToProfileNameMapHash>
FuncProfileMatchCache;
// The new functions found by the call graph matching. The map's key is the
// the new(renamed) function pointer and the value is old(unused) profile
// name.
std::unordered_map<Function *, FunctionId> FuncToProfileNameMap;

// A map pointer to the FuncNameToProfNameMap in SampleProfileLoader,
// which maps the function name to the matched profile name. This is used
// for sample loader to look up profile using the new name.
HashKeyMap<std::unordered_map, FunctionId, FunctionId> *FuncNameToProfNameMap;

// A map pointer to the SymbolMap in SampleProfileLoader, which stores all
// the original matched symbols before the matching. this is to determine if
// the profile is unused(to be matched) or not.
HashKeyMap<std::unordered_map, FunctionId, Function *> *SymbolMap;

// The new functions from IR.
HashKeyMap<std::unordered_map, FunctionId, Function *>
FunctionsWithoutProfile;

// Pointer to the Profile Symbol List in the reader.
std::shared_ptr<ProfileSymbolList> PSL;

// Profile mismatch statstics:
uint64_t TotalProfiledFunc = 0;
// Num of checksum-mismatched function.
Expand All @@ -72,34 +107,61 @@ class SampleProfileMatcher {
uint64_t MismatchedCallsiteSamples = 0;
uint64_t RecoveredCallsiteSamples = 0;

// Profile call-graph matching statstics:
uint64_t NumCallGraphRecoveredProfiledFunc = 0;
uint64_t NumCallGraphRecoveredFuncSamples = 0;

// A dummy name for unknown indirect callee, used to differentiate from a
// non-call instruction that also has an empty callee name.
static constexpr const char *UnknownIndirectCallee =
"unknown.indirect.callee";

public:
SampleProfileMatcher(Module &M, SampleProfileReader &Reader,
const PseudoProbeManager *ProbeManager,
ThinOrFullLTOPhase LTOPhase)
: M(M), Reader(Reader), ProbeManager(ProbeManager), LTOPhase(LTOPhase){};
SampleProfileMatcher(
Module &M, SampleProfileReader &Reader, LazyCallGraph &CG,
const PseudoProbeManager *ProbeManager, ThinOrFullLTOPhase LTOPhase,
HashKeyMap<std::unordered_map, FunctionId, Function *> &SymMap,
std::shared_ptr<ProfileSymbolList> PSL,
HashKeyMap<std::unordered_map, FunctionId, FunctionId>
&FuncNameToProfNameMap)
: M(M), Reader(Reader), CG(CG), ProbeManager(ProbeManager),
LTOPhase(LTOPhase), FuncNameToProfNameMap(&FuncNameToProfNameMap),
SymbolMap(&SymMap), PSL(PSL) {};
void runOnModule();
void clearMatchingData() {
// Do not clear FuncMappings, it stores IRLoc to ProfLoc remappings which
// will be used for sample loader.
FuncCallsiteMatchStates.clear();
// Do not clear FlattenedProfiles as it contains function names referenced
// by FuncNameToProfNameMap. Clearing this memory could lead to a
// use-after-free error.
freeContainer(FuncCallsiteMatchStates);
freeContainer(FunctionsWithoutProfile);
freeContainer(FuncToProfileNameMap);
}

private:
FunctionSamples *getFlattenedSamplesFor(const Function &F) {
StringRef CanonFName = FunctionSamples::getCanonicalFnName(F);
auto It = FlattenedProfiles.find(FunctionId(CanonFName));
FunctionSamples *getFlattenedSamplesFor(const FunctionId &Fname) {
auto It = FlattenedProfiles.find(Fname);
if (It != FlattenedProfiles.end())
return &It->second;
return nullptr;
}
FunctionSamples *getFlattenedSamplesFor(const Function &F) {
StringRef CanonFName = FunctionSamples::getCanonicalFnName(F);
return getFlattenedSamplesFor(FunctionId(CanonFName));
}
template <typename T> inline void freeContainer(T &C) {
T Empty;
std::swap(C, Empty);
}
void getFilteredAnchorList(const AnchorMap &IRAnchors,
const AnchorMap &ProfileAnchors,
AnchorList &FilteredIRAnchorsList,
AnchorList &FilteredProfileAnchorList);
void runOnFunction(Function &F);
void findIRAnchors(const Function &F, AnchorMap &IRAnchors);
void findProfileAnchors(const FunctionSamples &FS, AnchorMap &ProfileAnchors);
void findIRAnchors(const Function &F, AnchorMap &IRAnchors) const;
void findProfileAnchors(const FunctionSamples &FS,
AnchorMap &ProfileAnchors) const;
// Record the callsite match states for profile staleness report, the result
// is saved in FuncCallsiteMatchStates.
void recordCallsiteMatchStates(const Function &F, const AnchorMap &IRAnchors,
Expand All @@ -124,6 +186,9 @@ class SampleProfileMatcher {
State == MatchState::RemovedMatch;
};

void countCallGraphRecoveredSamples(
const FunctionSamples &FS,
std::unordered_set<FunctionId> &MatchedUnusedProfile);
// Count the samples of checksum mismatched function for the top-level
// function and all inlinees.
void countMismatchedFuncSamples(const FunctionSamples &FS, bool IsTopLevel);
Expand Down Expand Up @@ -151,15 +216,37 @@ class SampleProfileMatcher {
// parts from the resulting SES are used to remap the IR locations to the
// profile locations. As the number of function callsite is usually not big,
// we currently just implements the basic greedy version(page 6 of the paper).
LocToLocMap
longestCommonSequence(const AnchorList &IRCallsiteAnchors,
const AnchorList &ProfileCallsiteAnchors) const;
LocToLocMap longestCommonSequence(const AnchorList &IRCallsiteAnchors,
const AnchorList &ProfileCallsiteAnchors,
bool MatchUnusedFunction);
void matchNonCallsiteLocs(const LocToLocMap &AnchorMatchings,
const AnchorMap &IRAnchors,
LocToLocMap &IRToProfileLocationMap);
void runStaleProfileMatching(const Function &F, const AnchorMap &IRAnchors,
const AnchorMap &ProfileAnchors,
LocToLocMap &IRToProfileLocationMap);
LocToLocMap &IRToProfileLocationMap,
bool RunCFGMatching, bool RunCGMatching);
// If the function doesn't have profile, return the pointer to the function.
bool functionHasProfile(const FunctionId &IRFuncName,
Function *&FuncWithoutProfile);
bool isProfileUnused(const FunctionId &ProfileFuncName);
bool functionMatchesProfileHelper(const Function &IRFunc,
const FunctionId &ProfFunc);
// Determine if the function matches profile. If FindMatchedProfileOnly is
// set, only search the existing matched function. Otherwise, try matching the
// two functions.
bool functionMatchesProfile(const FunctionId &IRFuncName,
const FunctionId &ProfileFuncName,
bool FindMatchedProfileOnly);
// Determine if the function matches profile by computing a similarity ratio
// between two sequences of callsite anchors extracted from function and
// profile. If it's above the threshold, the function matches the profile.
bool functionMatchesProfile(Function &IRFunc, const FunctionId &ProfFunc,
bool FindMatchedProfileOnly);
// Find functions that don't show in the profile or profile symbol list,
// which are supposed to be new functions. We use them as the targets for
// call graph matching.
void findFunctionsWithoutProfile();
void reportOrPersistProfileStats();
};
} // end namespace llvm
Expand Down
17 changes: 17 additions & 0 deletions llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/LazyCallGraph.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/PostDominators.h"
Expand Down Expand Up @@ -155,6 +156,22 @@ static inline bool skipProfileForFunction(const Function &F) {
return F.isDeclaration() || !F.hasFnAttribute("use-sample-profile");
}

static inline void
buildTopDownFuncOrder(LazyCallGraph &CG,
std::vector<Function *> &FunctionOrderList) {
CG.buildRefSCCs();
for (LazyCallGraph::RefSCC &RC : CG.postorder_ref_sccs()) {
for (LazyCallGraph::SCC &C : RC) {
for (LazyCallGraph::Node &N : C) {
Function &F = N.getFunction();
if (!skipProfileForFunction(F))
FunctionOrderList.push_back(&F);
}
}
}
std::reverse(FunctionOrderList.begin(), FunctionOrderList.end());
}

template <typename FT> class SampleProfileLoaderBaseImpl {
public:
SampleProfileLoaderBaseImpl(std::string Name, std::string RemapName,
Expand Down
36 changes: 26 additions & 10 deletions llvm/lib/ProfileData/SampleProf.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,9 @@ LineLocation FunctionSamples::getCallSiteIdentifier(const DILocation *DIL,
}

const FunctionSamples *FunctionSamples::findFunctionSamples(
const DILocation *DIL, SampleProfileReaderItaniumRemapper *Remapper) const {
const DILocation *DIL, SampleProfileReaderItaniumRemapper *Remapper,
const HashKeyMap<std::unordered_map, FunctionId, FunctionId>
*FuncNameToProfNameMap) const {
assert(DIL);
SmallVector<std::pair<LineLocation, StringRef>, 10> S;

Expand All @@ -256,7 +258,8 @@ const FunctionSamples *FunctionSamples::findFunctionSamples(
return this;
const FunctionSamples *FS = this;
for (int i = S.size() - 1; i >= 0 && FS != nullptr; i--) {
FS = FS->findFunctionSamplesAt(S[i].first, S[i].second, Remapper);
FS = FS->findFunctionSamplesAt(S[i].first, S[i].second, Remapper,
FuncNameToProfNameMap);
}
return FS;
}
Expand All @@ -277,19 +280,32 @@ void FunctionSamples::findAllNames(DenseSet<FunctionId> &NameSet) const {

const FunctionSamples *FunctionSamples::findFunctionSamplesAt(
const LineLocation &Loc, StringRef CalleeName,
SampleProfileReaderItaniumRemapper *Remapper) const {
SampleProfileReaderItaniumRemapper *Remapper,
const HashKeyMap<std::unordered_map, FunctionId, FunctionId>
*FuncNameToProfNameMap) const {
CalleeName = getCanonicalFnName(CalleeName);

auto iter = CallsiteSamples.find(mapIRLocToProfileLoc(Loc));
if (iter == CallsiteSamples.end())
auto I = CallsiteSamples.find(mapIRLocToProfileLoc(Loc));
if (I == CallsiteSamples.end())
return nullptr;
auto FS = iter->second.find(getRepInFormat(CalleeName));
if (FS != iter->second.end())
auto FS = I->second.find(getRepInFormat(CalleeName));
if (FS != I->second.end())
return &FS->second;

if (FuncNameToProfNameMap && !FuncNameToProfNameMap->empty()) {
auto R = FuncNameToProfNameMap->find(FunctionId(CalleeName));
if (R != FuncNameToProfNameMap->end()) {
CalleeName = R->second.stringRef();
auto FS = I->second.find(getRepInFormat(CalleeName));
if (FS != I->second.end())
return &FS->second;
}
}

if (Remapper) {
if (auto NameInProfile = Remapper->lookUpNameInProfile(CalleeName)) {
auto FS = iter->second.find(getRepInFormat(*NameInProfile));
if (FS != iter->second.end())
auto FS = I->second.find(getRepInFormat(*NameInProfile));
if (FS != I->second.end())
return &FS->second;
}
}
Expand All @@ -300,7 +316,7 @@ const FunctionSamples *FunctionSamples::findFunctionSamplesAt(
return nullptr;
uint64_t MaxTotalSamples = 0;
const FunctionSamples *R = nullptr;
for (const auto &NameFS : iter->second)
for (const auto &NameFS : I->second)
if (NameFS.second.getTotalSamples() >= MaxTotalSamples) {
MaxTotalSamples = NameFS.second.getTotalSamples();
R = &NameFS.second;
Expand Down
Loading

0 comments on commit 18cdfa7

Please sign in to comment.