Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend kernel-info to emit PGO-based FLOP count #110586

Draft
wants to merge 137 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
137 commits
Select commit Hold shift + click to select a range
530eb98
Add profiling functions to libomptarget
EthanLuisMcDonough Dec 16, 2023
fb067d4
Fix PGO instrumentation for GPU targets
EthanLuisMcDonough Dec 16, 2023
7a0e0ef
Change global visibility on GPU targets
EthanLuisMcDonough Dec 21, 2023
fddc079
Make names global public on GPU
EthanLuisMcDonough Dec 23, 2023
e9db03c
Read and print GPU device PGO globals
EthanLuisMcDonough Dec 29, 2023
aa83bd2
Merge branch 'main' into gpuprof
EthanLuisMcDonough Dec 29, 2023
e468760
Fix rebase bug
EthanLuisMcDonough Jan 3, 2024
ec18ce9
Refactor portions to be more idiomatic
EthanLuisMcDonough Jan 3, 2024
0872556
Reformat DeviceRTL prof functions
EthanLuisMcDonough Jan 3, 2024
94f47f3
Merge branch 'main' into gpuprof
EthanLuisMcDonough Jan 3, 2024
62f31d1
Style changes + catch name error
EthanLuisMcDonough Jan 9, 2024
0c4bbeb
Add GPU PGO test
EthanLuisMcDonough Jan 18, 2024
c7ae2a7
Fix PGO test formatting
EthanLuisMcDonough Jan 18, 2024
9e66bfb
Merge branch 'main' into gpuprof
EthanLuisMcDonough Jan 19, 2024
8bb2207
Refactor visibility logic
EthanLuisMcDonough Jan 19, 2024
9f13943
Add LLVM instrumentation support
EthanLuisMcDonough Jan 24, 2024
b28d4a9
Merge branch 'main' into gpuprof
EthanLuisMcDonough Jan 24, 2024
23d7fe2
Merge branch 'main' into gpuprof
EthanLuisMcDonough Feb 14, 2024
0606f0d
Use explicit addrspace instead of unqual
EthanLuisMcDonough Feb 14, 2024
23f75b2
Merge branch 'main' into gpuprof
EthanLuisMcDonough Feb 15, 2024
c1f9be3
Remove redundant namespaces
EthanLuisMcDonough Feb 16, 2024
721dac6
Merge branch 'main' into gpuprof
EthanLuisMcDonough Feb 16, 2024
6a3ae40
Clang format
EthanLuisMcDonough Feb 16, 2024
6866862
Use getAddrSpaceCast
EthanLuisMcDonough Feb 16, 2024
62a5ee1
Revert "Use getAddrSpaceCast"
EthanLuisMcDonough Feb 27, 2024
052394f
Revert "Use getAddrSpaceCast"
EthanLuisMcDonough Feb 27, 2024
612d5a5
Write PGO
EthanLuisMcDonough Mar 1, 2024
b8c9163
Fix tests
EthanLuisMcDonough Mar 14, 2024
e572452
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough Mar 14, 2024
4568c42
Fix arguments
EthanLuisMcDonough Mar 14, 2024
d86b101
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough Mar 19, 2024
1fc4cb9
Add GPU prof flags
EthanLuisMcDonough Mar 19, 2024
849b244
Fix elf obj file
EthanLuisMcDonough Mar 19, 2024
55bd8d2
Add GPU use profile option
EthanLuisMcDonough Mar 19, 2024
7231080
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough Apr 6, 2024
4ebbb45
Add more addrspace casts for GPU targets
EthanLuisMcDonough May 7, 2024
4be80e5
Merge branch 'main' into gpuprof
EthanLuisMcDonough May 7, 2024
b2fe222
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough May 7, 2024
7770b37
Fix params
EthanLuisMcDonough May 7, 2024
702d170
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough May 7, 2024
619fb69
Resolve merge conflict
EthanLuisMcDonough May 7, 2024
f6a1545
Merge branch 'main' into gpuprof
EthanLuisMcDonough May 9, 2024
92260d8
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough May 9, 2024
58491a7
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough May 9, 2024
6267c2a
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough May 11, 2024
3f08ae9
Have test read from profraw instead of dump
EthanLuisMcDonough May 11, 2024
09f2b39
Remove debug dump
EthanLuisMcDonough May 11, 2024
1dbde8e
Merge branch 'main' into gpuprof
EthanLuisMcDonough May 13, 2024
1278989
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough May 13, 2024
ff8f233
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough May 13, 2024
ed2a289
Merge branch 'main' into gpuprof_ptrcastfix
EthanLuisMcDonough May 13, 2024
aa895a1
Fix elf obj file
EthanLuisMcDonough Mar 19, 2024
2031e49
Add more addrspace casts for GPU targets
EthanLuisMcDonough May 7, 2024
5de6082
Merge branch 'gpuprof_ptrcastfix' into gpuprofwrite
EthanLuisMcDonough May 13, 2024
3e43a18
Merge branch 'gpuprof_ptrcastfix' into gpuprofdriver
EthanLuisMcDonough May 13, 2024
be6524b
Have test read from profraw instead of dump
EthanLuisMcDonough May 13, 2024
000deed
Merge branch 'gpuprofwrite' into gpuprofdriver
EthanLuisMcDonough May 13, 2024
e266cc7
Fix GPU PGO names
EthanLuisMcDonough May 17, 2024
c754f7f
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough May 24, 2024
2b8eb29
Fix PGO test format
EthanLuisMcDonough May 25, 2024
67f3009
Refactor profile writer
EthanLuisMcDonough May 25, 2024
1cec247
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough May 25, 2024
cee07bc
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough May 27, 2024
e8ad132
Fix refactor bug
EthanLuisMcDonough May 27, 2024
9e23b08
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough May 28, 2024
1e8fafc
Merge branch 'gpuprofwrite' into gpuprofdriver
EthanLuisMcDonough May 28, 2024
79bf08e
Check for level in test case
EthanLuisMcDonough May 28, 2024
4c9f814
Make requested clang-format change
EthanLuisMcDonough May 28, 2024
e187f5a
Merge branch 'gpuprofwrite' into gpuprofdriver
EthanLuisMcDonough May 28, 2024
cfe1660
Check for version global on GPU
EthanLuisMcDonough May 30, 2024
5bf4376
Add host/device combination test
EthanLuisMcDonough May 31, 2024
2530137
Add PGO dump debug option
EthanLuisMcDonough May 31, 2024
f9138fb
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough Jun 1, 2024
79ceacb
Tighten PGO test requirements
EthanLuisMcDonough Jun 1, 2024
ff0dd62
Add note about PGO debug flag
EthanLuisMcDonough Jun 1, 2024
0b9cc35
Fix clang format
EthanLuisMcDonough Jun 4, 2024
bf5dbd6
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough Jun 23, 2024
90a6e30
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough Aug 10, 2024
f9a24e3
Update test requirements
EthanLuisMcDonough Aug 10, 2024
6eb137e
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough Aug 10, 2024
5a671f6
[KernelInfo] Implement new LLVM IR pass for GPU code analysis
jdenny-ornl Aug 12, 2024
a7656de
Move docs to KernelInfo.rst
jdenny-ornl Aug 12, 2024
d92856e
Move conditional outside registration call
jdenny-ornl Aug 12, 2024
5727284
Merge changes
EthanLuisMcDonough Aug 12, 2024
6ac3f41
Use llvm::SmallString
jdenny-ornl Aug 12, 2024
6367ad7
Use TTI.getFlatAddressSpace for addrspace(0)
jdenny-ornl Aug 12, 2024
78446bb
Avoid repetition between amdgpu and nvptx tests
jdenny-ornl Aug 12, 2024
fede524
Use named values in tests
jdenny-ornl Aug 12, 2024
4c30b8a
Say flat address space instead of addrspace(0)
jdenny-ornl Aug 13, 2024
33f0d4d
Cache the flat address space
jdenny-ornl Aug 13, 2024
a2a512c
Link KernelInfo.rst from Passes.rst
jdenny-ornl Aug 13, 2024
de04ac4
Don't filter out cpus
jdenny-ornl Aug 13, 2024
ec5d2bd
Include less in header
jdenny-ornl Aug 16, 2024
c06b905
Removed unused comparison operators
jdenny-ornl Aug 16, 2024
d83d22a
Remove redundant null check
jdenny-ornl Aug 16, 2024
1649cf8
Move KernelInfo to KernelInfo.cpp, remove KernelInfoAnalysis
jdenny-ornl Aug 16, 2024
1a3c0ae
Use printAsOperand not getName to identify instruction
jdenny-ornl Aug 16, 2024
ea89a81
Use printAsOperand to report indirect callee
jdenny-ornl Aug 16, 2024
8da602b
Report inline assembly calls
jdenny-ornl Aug 16, 2024
45114fd
Use llvm::SmallString
jdenny-ornl Aug 16, 2024
eea139c
Use llvm::SmallString
jdenny-ornl Aug 16, 2024
8bf6e4e
getKernelInfo -> emitKernelInfo because return is unused
jdenny-ornl Aug 16, 2024
d2ee05d
Merge branch 'main' into kernel-info-pr
jdenny-ornl Aug 21, 2024
9b865f4
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 5, 2024
39979f7
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 12, 2024
62d494d
Clean up launch bounds
jdenny-ornl Sep 13, 2024
e4d3fca
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 16, 2024
94d90d1
Adjust forEachLaunchBound param
jdenny-ornl Sep 16, 2024
762a217
Reuse Function::getFnAttributeAsParsedInteger
jdenny-ornl Sep 16, 2024
df66a3d
Move forEachLaunchBound to TargetTransformInfo
jdenny-ornl Sep 16, 2024
5488764
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 26, 2024
3f63d53
forEachLaunchBound -> collectLaunchBounds
jdenny-ornl Sep 26, 2024
0658a21
Merge branch 'main' into gpuprofdriver
EthanLuisMcDonough Sep 27, 2024
f5d9f55
Rebase updates
EthanLuisMcDonough Sep 28, 2024
e246227
Hack offload tests to find built llvm-profdata
jdenny-ornl Sep 27, 2024
3b6ce07
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 28, 2024
feeaa37
Remove redundant private
jdenny-ornl Sep 28, 2024
557dd16
Merge branch 'pr-94268-fixup' into kernel-info-pgo
jdenny-ornl Sep 28, 2024
d2847b0
Extend kernel-info to emit PGO-based FLOP count
jdenny-ornl Sep 30, 2024
0672e2c
Merge branch 'main' into kernel-info-pgo
jdenny-ornl Oct 3, 2024
e04b933
Improve some kernel-info instruction remarks
jdenny-ornl Oct 3, 2024
b9b95a2
Merge branch 'main' into kernel-info-pr
jdenny-ornl Oct 11, 2024
116f1c9
Remove todos, as requested
jdenny-ornl Oct 11, 2024
2094465
Combine registerFullLinkTimeOptimizationLastEPCallback calls
jdenny-ornl Oct 11, 2024
39bce7c
collectLaunchBounds -> collectKernelLaunchBounds
jdenny-ornl Oct 11, 2024
14345cf
Spell kernel-info properties like their IR attributes
jdenny-ornl Oct 11, 2024
ad393d2
Replace -kernel-info-end-lto with -no-kernel-info-end-lto
jdenny-ornl Oct 11, 2024
d3beccf
Apply clang-format
jdenny-ornl Oct 11, 2024
5a4b873
Avoid auto, as requested
jdenny-ornl Oct 14, 2024
571181b
For function name, use debug info or keep @
jdenny-ornl Oct 14, 2024
cfda91d
Merge branch 'kernel-info-pr' into kernel-info-pgo
jdenny-ornl Oct 15, 2024
a5ce547
Use anonymous namespace
jdenny-ornl Oct 16, 2024
4d60911
Remove currently unused capabilities, as requested
jdenny-ornl Oct 16, 2024
0c30e7c
Rename test files without LLVM IR to .test
jdenny-ornl Oct 16, 2024
f5a6fbd
Regenerate OpenMP tests from current clang
jdenny-ornl Oct 17, 2024
baad223
Include LLVM value name in alloca report
jdenny-ornl Oct 17, 2024
28a5bcb
Merge branch 'kernel-info-pr' into kernel-info-pgo
jdenny-ornl Oct 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -1790,6 +1790,9 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling",
def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>,
HelpText<"Generate instrumented code to collect execution counts into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
def fprofile_instr_generate_gpu : Flag<["-"], "fprofile-instr-generate-gpu">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>,
HelpText<"Generate instrumented GPU device code to collect execution counts into GPU_TARGET.default.profraw (overridden by LLVM_PROFILE_FILE env var)">;
def fprofile_instr_generate_EQ : Joined<["-"], "fprofile-instr-generate=">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>, MetaVarName<"<file>">,
HelpText<"Generate instrumented code to collect execution counts into <file> (overridden by LLVM_PROFILE_FILE env var)">;
Expand Down Expand Up @@ -1826,6 +1829,9 @@ def fmcdc_max_test_vectors_EQ : Joined<["-"], "fmcdc-max-test-vectors=">,
def fprofile_generate : Flag<["-"], "fprofile-generate">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>,
HelpText<"Generate instrumented code to collect execution counts into default.profraw (overridden by LLVM_PROFILE_FILE env var)">;
def fprofile_generate_gpu : Flag<["-"], "fprofile-generate-gpu">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>,
HelpText<"Generate instrumented GPU device code to collect execution counts into GPU_TARGET.default.profraw (overridden by LLVM_PROFILE_FILE env var)">;
def fprofile_generate_EQ : Joined<["-"], "fprofile-generate=">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>,
MetaVarName<"<directory>">,
Expand All @@ -1844,6 +1850,11 @@ def fprofile_use_EQ : Joined<["-"], "fprofile-use=">,
Visibility<[ClangOption, CLOption]>,
MetaVarName<"<pathname>">,
HelpText<"Use instrumentation data for profile-guided optimization. If pathname is a directory, it reads from <pathname>/default.profdata. Otherwise, it reads from file <pathname>.">;
def fprofile_use_gpu_EQ : Joined<["-"], "fprofile-use-gpu=">,
Group<f_Group>,
Visibility<[ClangOption, CLOption]>,
MetaVarName<"<pathname>">,
HelpText<"Use instrumentation data for profile-guided optimization targeting GPU">;
def fno_profile_instr_generate : Flag<["-"], "fno-profile-instr-generate">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>,
HelpText<"Disable generation of profile instrumentation.">;
Expand Down
63 changes: 30 additions & 33 deletions clang/lib/Driver/ToolChain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -196,10 +196,9 @@ bool ToolChain::defaultToIEEELongDouble() const {
return PPC_LINUX_DEFAULT_IEEELONGDOUBLE && getTriple().isOSLinux();
}

static void getAArch64MultilibFlags(const Driver &D,
const llvm::Triple &Triple,
const llvm::opt::ArgList &Args,
Multilib::flags_list &Result) {
static void getAArch64MultilibFlags(const Driver &D, const llvm::Triple &Triple,
const llvm::opt::ArgList &Args,
Multilib::flags_list &Result) {
std::vector<StringRef> Features;
tools::aarch64::getAArch64TargetFeatures(D, Triple, Args, Features, false);
const auto UnifiedFeatures = tools::unifyTargetFeatures(Features);
Expand Down Expand Up @@ -234,10 +233,9 @@ static void getAArch64MultilibFlags(const Driver &D,
}
}

static void getARMMultilibFlags(const Driver &D,
const llvm::Triple &Triple,
const llvm::opt::ArgList &Args,
Multilib::flags_list &Result) {
static void getARMMultilibFlags(const Driver &D, const llvm::Triple &Triple,
const llvm::opt::ArgList &Args,
Multilib::flags_list &Result) {
std::vector<StringRef> Features;
llvm::ARM::FPUKind FPUKind = tools::arm::getARMTargetFeatures(
D, Triple, Args, Features, false /*ForAs*/, true /*ForMultilib*/);
Expand Down Expand Up @@ -353,7 +351,7 @@ ToolChain::getSanitizerArgs(const llvm::opt::ArgList &JobArgs) const {
return SanArgs;
}

const XRayArgs& ToolChain::getXRayArgs() const {
const XRayArgs &ToolChain::getXRayArgs() const {
if (!XRayArguments)
XRayArguments.reset(new XRayArgs(*this, Args));
return *XRayArguments;
Expand Down Expand Up @@ -447,8 +445,7 @@ static const DriverSuffix *parseDriverSuffix(StringRef ProgName, size_t &Pos) {
return DS;
}

ParsedClangName
ToolChain::getTargetAndModeFromProgramName(StringRef PN) {
ParsedClangName ToolChain::getTargetAndModeFromProgramName(StringRef PN) {
std::string ProgName = normalizeProgramName(PN);
size_t SuffixPos;
const DriverSuffix *DS = parseDriverSuffix(ProgName, SuffixPos);
Expand All @@ -459,8 +456,8 @@ ToolChain::getTargetAndModeFromProgramName(StringRef PN) {
size_t LastComponent = ProgName.rfind('-', SuffixPos);
if (LastComponent == std::string::npos)
return ParsedClangName(ProgName.substr(0, SuffixEnd), DS->ModeFlag);
std::string ModeSuffix = ProgName.substr(LastComponent + 1,
SuffixEnd - LastComponent - 1);
std::string ModeSuffix =
ProgName.substr(LastComponent + 1, SuffixEnd - LastComponent - 1);

// Infer target from the prefix.
StringRef Prefix(ProgName);
Expand Down Expand Up @@ -518,9 +515,7 @@ Tool *ToolChain::getFlang() const {
return Flang.get();
}

Tool *ToolChain::buildAssembler() const {
return new tools::ClangAs(*this);
}
Tool *ToolChain::buildAssembler() const { return new tools::ClangAs(*this); }

Tool *ToolChain::buildLinker() const {
llvm_unreachable("Linking is not supported by this toolchain");
Expand Down Expand Up @@ -891,10 +886,12 @@ bool ToolChain::needsProfileRT(const ArgList &Args) {
return false;

return Args.hasArg(options::OPT_fprofile_generate) ||
Args.hasArg(options::OPT_fprofile_generate_gpu) ||
Args.hasArg(options::OPT_fprofile_generate_EQ) ||
Args.hasArg(options::OPT_fcs_profile_generate) ||
Args.hasArg(options::OPT_fcs_profile_generate_EQ) ||
Args.hasArg(options::OPT_fprofile_instr_generate) ||
Args.hasArg(options::OPT_fprofile_instr_generate_gpu) ||
Args.hasArg(options::OPT_fprofile_instr_generate_EQ) ||
Args.hasArg(options::OPT_fcreate_profile) ||
Args.hasArg(options::OPT_forder_file_instrumentation);
Expand All @@ -907,8 +904,10 @@ bool ToolChain::needsGCovInstrumentation(const llvm::opt::ArgList &Args) {
}

Tool *ToolChain::SelectTool(const JobAction &JA) const {
if (D.IsFlangMode() && getDriver().ShouldUseFlangCompiler(JA)) return getFlang();
if (getDriver().ShouldUseClangCompiler(JA)) return getClang();
if (D.IsFlangMode() && getDriver().ShouldUseFlangCompiler(JA))
return getFlang();
if (getDriver().ShouldUseClangCompiler(JA))
return getClang();
Action::ActionClass AC = JA.getKind();
if (AC == Action::AssembleJobClass && useIntegratedAs() &&
!getTriple().isOSAIX())
Expand All @@ -930,7 +929,7 @@ std::string ToolChain::GetLinkerPath(bool *LinkerIsLLD) const {

// Get -fuse-ld= first to prevent -Wunused-command-line-argument. -fuse-ld= is
// considered as the linker flavor, e.g. "bfd", "gold", or "lld".
const Arg* A = Args.getLastArg(options::OPT_fuse_ld_EQ);
const Arg *A = Args.getLastArg(options::OPT_fuse_ld_EQ);
StringRef UseLinker = A ? A->getValue() : CLANG_DEFAULT_LINKER;

// --ld-path= takes precedence over -fuse-ld= and specifies the executable
Expand Down Expand Up @@ -1015,9 +1014,7 @@ types::ID ToolChain::LookupTypeForExtension(StringRef Ext) const {
return id;
}

bool ToolChain::HasNativeLLVMSupport() const {
return false;
}
bool ToolChain::HasNativeLLVMSupport() const { return false; }

bool ToolChain::isCrossCompiling() const {
llvm::Triple HostTriple(LLVM_HOST_TRIPLE);
Expand All @@ -1029,7 +1026,8 @@ bool ToolChain::isCrossCompiling() const {
case llvm::Triple::thumb:
case llvm::Triple::thumbeb:
return getArch() != llvm::Triple::arm && getArch() != llvm::Triple::thumb &&
getArch() != llvm::Triple::armeb && getArch() != llvm::Triple::thumbeb;
getArch() != llvm::Triple::armeb &&
getArch() != llvm::Triple::thumbeb;
default:
return HostTriple.getArch() != getArch();
}
Expand Down Expand Up @@ -1112,9 +1110,7 @@ std::string ToolChain::ComputeEffectiveClangTriple(const ArgList &Args,
return ComputeLLVMTriple(Args, InputType);
}

std::string ToolChain::computeSysRoot() const {
return D.SysRoot;
}
std::string ToolChain::computeSysRoot() const { return D.SysRoot; }

void ToolChain::AddClangSystemIncludeArgs(const ArgList &DriverArgs,
ArgStringList &CC1Args) const {
Expand All @@ -1138,12 +1134,12 @@ void ToolChain::addProfileRTLibs(const llvm::opt::ArgList &Args,
CmdArgs.push_back(getCompilerRTArgString(Args, "profile"));
}

ToolChain::RuntimeLibType ToolChain::GetRuntimeLibType(
const ArgList &Args) const {
ToolChain::RuntimeLibType
ToolChain::GetRuntimeLibType(const ArgList &Args) const {
if (runtimeLibType)
return *runtimeLibType;

const Arg* A = Args.getLastArg(options::OPT_rtlib_EQ);
const Arg *A = Args.getLastArg(options::OPT_rtlib_EQ);
StringRef LibName = A ? A->getValue() : CLANG_DEFAULT_RTLIB;

// Only use "platform" in tests to override CLANG_DEFAULT_RTLIB!
Expand All @@ -1164,8 +1160,8 @@ ToolChain::RuntimeLibType ToolChain::GetRuntimeLibType(
return *runtimeLibType;
}

ToolChain::UnwindLibType ToolChain::GetUnwindLibType(
const ArgList &Args) const {
ToolChain::UnwindLibType
ToolChain::GetUnwindLibType(const ArgList &Args) const {
if (unwindLibType)
return *unwindLibType;

Expand Down Expand Up @@ -1200,7 +1196,8 @@ ToolChain::UnwindLibType ToolChain::GetUnwindLibType(
return *unwindLibType;
}

ToolChain::CXXStdlibType ToolChain::GetCXXStdlibType(const ArgList &Args) const{
ToolChain::CXXStdlibType
ToolChain::GetCXXStdlibType(const ArgList &Args) const {
if (cxxStdlibType)
return *cxxStdlibType;

Expand Down Expand Up @@ -1356,7 +1353,7 @@ void ToolChain::AddCXXStdlibLibArgs(const ArgList &Args,
void ToolChain::AddFilePathLibArgs(const ArgList &Args,
ArgStringList &CmdArgs) const {
for (const auto &LibPath : getFilePaths())
if(LibPath.length() > 0)
if (LibPath.length() > 0)
CmdArgs.push_back(Args.MakeArgString(StringRef("-L") + LibPath));
}

Expand Down
80 changes: 76 additions & 4 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -584,6 +584,76 @@ static void addDashXForInput(const ArgList &Args, const InputInfo &Input,
}
}

static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args,
ArgStringList &CmdArgs) {
const Driver &D = TC.getDriver();
auto *ProfileClangArg =
Args.getLastArg(options::OPT_fprofile_instr_generate_gpu,
options::OPT_fno_profile_generate);
auto *ProfileLLVMArg = Args.getLastArg(options::OPT_fprofile_generate_gpu,
options::OPT_fno_profile_generate);
auto *ProfileUseArg = Args.getLastArg(options::OPT_fprofile_use_gpu_EQ,
options::OPT_fno_profile_instr_use);

auto *HostLLVMArg = Args.getLastArgNoClaim(options::OPT_fprofile_generate,
options::OPT_fprofile_generate_EQ);
auto *HostClangArg =
Args.getLastArgNoClaim(options::OPT_fprofile_instr_generate,
options::OPT_fprofile_instr_generate_EQ);

if (ProfileClangArg &&
ProfileClangArg->getOption().matches(options::OPT_fno_profile_generate))
ProfileClangArg = nullptr;

if (ProfileLLVMArg &&
ProfileLLVMArg->getOption().matches(options::OPT_fno_profile_generate))
ProfileLLVMArg = nullptr;

if (ProfileUseArg &&
ProfileUseArg->getOption().matches(options::OPT_fno_profile_generate))
ProfileUseArg = nullptr;

if (ProfileClangArg && ProfileLLVMArg) {
D.Diag(diag::err_drv_argument_not_allowed_with)
<< ProfileClangArg->getSpelling() << ProfileLLVMArg->getSpelling();
return;
}

if (ProfileUseArg && ProfileClangArg) {
D.Diag(diag::err_drv_argument_not_allowed_with)
<< ProfileClangArg->getSpelling() << ProfileUseArg->getSpelling();
return;
}

if (ProfileUseArg && ProfileLLVMArg) {
D.Diag(diag::err_drv_argument_not_allowed_with)
<< ProfileLLVMArg->getSpelling() << ProfileUseArg->getSpelling();
return;
}

if (HostLLVMArg && ProfileClangArg) {
D.Diag(diag::err_drv_argument_not_allowed_with)
<< HostLLVMArg->getSpelling() << ProfileClangArg->getSpelling();
return;
}

if (HostClangArg && ProfileLLVMArg) {
D.Diag(diag::err_drv_argument_not_allowed_with)
<< HostClangArg->getSpelling() << ProfileLLVMArg->getSpelling();
return;
}

if (ProfileClangArg)
CmdArgs.push_back("-fprofile-instrument=clang");

if (ProfileLLVMArg)
CmdArgs.push_back("-fprofile-instrument=llvm");

if (ProfileUseArg)
CmdArgs.push_back(Args.MakeArgString(
Twine("-fprofile-instrument-use-path=") + ProfileUseArg->getValue()));
}

static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C,
const JobAction &JA, const InputInfo &Output,
const ArgList &Args, SanitizerArgs &SanArgs,
Expand Down Expand Up @@ -6302,10 +6372,12 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
Args.AddLastArg(CmdArgs, options::OPT_fconvergent_functions,
options::OPT_fno_convergent_functions);

// NVPTX/AMDGCN doesn't support PGO or coverage. There's no runtime support
// for sampling, overhead of call arc collection is way too high and there's
// no way to collect the output.
if (!Triple.isNVPTX() && !Triple.isAMDGCN())
// NVPTX/AMDGCN PGO is handled separately
// GPU targets don't have their own profiling libraries and are
// collected/handled by the host's profiling library
if (Triple.isNVPTX() || Triple.isAMDGCN())
addPGOFlagsGPU(TC, Args, CmdArgs);
else
addPGOAndCoverageFlags(TC, C, JA, Output, Args, SanitizeArgs, CmdArgs);

Args.AddLastArg(CmdArgs, options::OPT_fclang_abi_compat_EQ);
Expand Down
2 changes: 1 addition & 1 deletion compiler-rt/include/profile/InstrProfData.inc
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \
#define INSTR_PROF_DATA_DEFINED
#endif
INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
INSTR_PROF_RAW_HEADER(uint64_t, NumData, NumData)
INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters)
Expand Down
12 changes: 12 additions & 0 deletions compiler-rt/lib/profile/InstrProfiling.h
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,18 @@ int __llvm_profile_get_padding_sizes_for_counters(
*/
void __llvm_profile_set_dumped(void);

/*!
* \brief Write custom target-specific profiling data to a seperate file.
* Used by libomptarget for GPU PGO.
*/
int __llvm_write_custom_profile(const char *Target,
const __llvm_profile_data *DataBegin,
const __llvm_profile_data *DataEnd,
const char *CountersBegin,
const char *CountersEnd, const char *NamesBegin,
const char *NamesEnd,
const uint64_t *VersionOverride);

/*!
* This variable is defined in InstrProfilingRuntime.cpp as a hidden
* symbol. Its main purpose is to enable profile runtime user to
Expand Down
3 changes: 2 additions & 1 deletion compiler-rt/lib/profile/InstrProfilingBuffer.c
Original file line number Diff line number Diff line change
Expand Up @@ -252,5 +252,6 @@ COMPILER_RT_VISIBILITY int __llvm_profile_write_buffer_internal(
&BufferWriter, DataBegin, DataEnd, CountersBegin, CountersEnd,
BitmapBegin, BitmapEnd, /*VPDataReader=*/0, NamesBegin, NamesEnd,
/*VTableBegin=*/NULL, /*VTableEnd=*/NULL, /*VNamesBegin=*/NULL,
/*VNamesEnd=*/NULL, /*SkipNameDataWrite=*/0);
/*VNamesEnd=*/NULL, /*SkipNameDataWrite=*/0,
__llvm_profile_get_version());
}
Loading