Skip to content

Commit

Permalink
[AMDGPU] Strengthen preload intrinsics to noundef and nonnull
Browse files Browse the repository at this point in the history
The various preloaded registers (workitem IDs, workgroup IDs, various
implicit pointers) always have a finite, inviriant, well-defined value
throughout a well-defined program.

In cases where the compiler infers or the user declares that some
implicit input will not be used (ex. via amdgcn-no-workitem-id-y), the
behavior of the entire program is undefined, since that misdeclaration
can cause arbitrary other preloaded-register intrinsics to access the
wrong register. This case is not expected to arise in practice, but
could occur when the no implicit argument attributes were not cleared
correctly in the prenence of external functions, indrect calls, or
other means of executing un-analyzable code. Failure to detect that
case would be a bug in the attributor.

This commit updates the documentation to reflect this long-standing
reality.

Then, on the basis that all implicit arguments are defined in all
correct programs, the intrinsics that return those values are
annototated with `noundef` and, in the case of implicit pointers,
`nonnull`.

This will prevent spurious calls to `freeze` in front-end
optimizations that destroy user-provided ranges on built-in IDs.

(While I'm here, this commit adds a test for `noundef` on kernel
arguments which is currently unimplemented)
  • Loading branch information
krzysz00 committed May 20, 2024
1 parent 549fdda commit 584c34a
Show file tree
Hide file tree
Showing 7 changed files with 269 additions and 239 deletions.
7 changes: 5 additions & 2 deletions llvm/docs/AMDGPUUsage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1388,8 +1388,11 @@ The AMDGPU backend supports the following LLVM IR attributes.

"amdgpu-no-workitem-id-x" Indicates the function does not depend on the value of the
llvm.amdgcn.workitem.id.x intrinsic. If a function is marked with this
attribute, or reached through a call site marked with this attribute,
the value returned by the intrinsic is undefined. The backend can
attribute, or reached through a call site marked with this attribute, and
that intrinsic is called, the behavior of the program is undefined. (Whole-program
undefined behavior is used here because, for example, the absence of a required workitem
ID in the preloaded register set can mean that all other preloaded registers
are earlier than the compilation assumed they would be.) The backend can
generally infer this during code generation, so typically there is no
benefit to frontends marking functions with this.

Expand Down
44 changes: 27 additions & 17 deletions llvm/include/llvm/IR/IntrinsicsAMDGPU.td
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,16 @@

def global_ptr_ty : LLVMQualPointerType<1>;

// The amdgpu-no-* attributes (ex amdgpu-no-workitem-id-z) typically inferred
// by the backend cause whole-program undefined behavior when violated, such as
// by causing all other preload register intrinsics to return arbitrarily incorrect
// values. Outside of such IR-level UB, these preloaded registers are always set
// to a well-defined value and are thus `noundef`.
class AMDGPUReadPreloadRegisterIntrinsic
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

class AMDGPUReadPreloadRegisterIntrinsicNamed<string name>
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>, ClangBuiltin<name>;
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>, ClangBuiltin<name>;

// Used to tag image and resource intrinsics with information used to generate
// mem operands.
Expand Down Expand Up @@ -56,7 +61,7 @@ def int_r600_group_barrier : ClangBuiltin<"__builtin_r600_group_barrier">,
def int_r600_implicitarg_ptr :
ClangBuiltin<"__builtin_r600_implicitarg_ptr">,
DefaultAttrsIntrinsic<[LLVMQualPointerType<7>], [],
[IntrNoMem, IntrSpeculatable]>;
[NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

def int_r600_rat_store_typed :
// 1st parameter: Data
Expand Down Expand Up @@ -144,39 +149,44 @@ defm int_amdgcn_workgroup_id : AMDGPUReadPreloadRegisterIntrinsic_xyz_named

def int_amdgcn_dispatch_ptr :
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
[Align<RetIndex, 4>, NoUndef<RetIndex>, NonNull<RetIndex>, IntrNoMem, IntrSpeculatable]>;

def int_amdgcn_queue_ptr :
ClangBuiltin<"__builtin_amdgcn_queue_ptr">,
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
[Align<RetIndex, 4>, NoUndef<RetIndex>, NonNull<RetIndex>, IntrNoMem, IntrSpeculatable]>;

def int_amdgcn_kernarg_segment_ptr :
ClangBuiltin<"__builtin_amdgcn_kernarg_segment_ptr">,
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
[Align<RetIndex, 4>, NoUndef<RetIndex>, NonNull<RetIndex>, IntrNoMem, IntrSpeculatable]>;

def int_amdgcn_implicitarg_ptr :
ClangBuiltin<"__builtin_amdgcn_implicitarg_ptr">,
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
[Align<RetIndex, 4>, NoUndef<RetIndex>, NonNull<RetIndex>, IntrNoMem, IntrSpeculatable]>;

// Returns the amount of LDS statically allocated for this program.
// This is no longer guaranteed to be a compile-time constant due to linking
// support.
def int_amdgcn_groupstaticsize :
ClangBuiltin<"__builtin_amdgcn_groupstaticsize">,
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

def int_amdgcn_dispatch_id :
ClangBuiltin<"__builtin_amdgcn_dispatch_id">,
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [IntrNoMem, IntrSpeculatable]>;
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

// For internal use. Coordinates LDS lowering between IR transform and backend.
def int_amdgcn_lds_kernel_id :
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

def int_amdgcn_implicit_buffer_ptr :
ClangBuiltin<"__builtin_amdgcn_implicit_buffer_ptr">,
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
[Align<RetIndex, 4>, Dereferenceable<RetIndex, 16>,
NoUndef<RetIndex>, NonNull<RetIndex>,
IntrNoMem, IntrSpeculatable]>;

// Set EXEC to the 64-bit value given.
// This is always moved to the beginning of the basic block.
Expand All @@ -199,7 +209,7 @@ def int_amdgcn_init_exec_from_input : Intrinsic<[],

def int_amdgcn_wavefrontsize :
ClangBuiltin<"__builtin_amdgcn_wavefrontsize">,
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

// Represent a relocation constant.
def int_amdgcn_reloc_constant : DefaultAttrsIntrinsic<
Expand Down Expand Up @@ -1923,8 +1933,8 @@ def int_amdgcn_s_setreg :
// s_getpc_b64 instruction returns a zero-extended value.
def int_amdgcn_s_getpc :
ClangBuiltin<"__builtin_amdgcn_s_getpc">,
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [IntrNoMem, IntrSpeculatable,
IntrWillReturn]>;
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [NoUndef<RetIndex>, IntrNoMem,
IntrSpeculatable, IntrWillReturn]>;

// __builtin_amdgcn_interp_mov <param>, <attr_chan>, <attr>, <m0>
// param values: 0 = P10, 1 = P20, 2 = P0
Expand Down Expand Up @@ -2044,7 +2054,7 @@ def int_amdgcn_ps_live : DefaultAttrsIntrinsic <
// Query currently live lanes.
// Returns true if lane is live (and not a helper lane).
def int_amdgcn_live_mask : DefaultAttrsIntrinsic <[llvm_i1_ty],
[], [IntrReadMem, IntrInaccessibleMemOnly]
[], [NoUndef<RetIndex>, IntrReadMem, IntrInaccessibleMemOnly]
>;

def int_amdgcn_mbcnt_lo :
Expand Down Expand Up @@ -2515,7 +2525,7 @@ def int_amdgcn_mov_dpp8 :
def int_amdgcn_s_get_waveid_in_workgroup :
ClangBuiltin<"__builtin_amdgcn_s_get_waveid_in_workgroup">,
Intrinsic<[llvm_i32_ty], [],
[IntrNoMem, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>;
[NoUndef<RetIndex>, IntrNoMem, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>;

class AMDGPUAtomicRtn<LLVMType vt, LLVMType pt = llvm_anyptr_ty> : Intrinsic <
[vt],
Expand Down Expand Up @@ -2749,7 +2759,7 @@ def int_amdgcn_global_load_tr_b128 : AMDGPULoadIntrinsic<global_ptr_ty>;

// i32 @llvm.amdgcn.wave.id()
def int_amdgcn_wave_id :
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

//===----------------------------------------------------------------------===//
// Deep learning intrinsics.
Expand Down
2 changes: 0 additions & 2 deletions llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -120,8 +120,6 @@ static bool lowerKernelArguments(Function &F, const TargetMachine &TM) {
CallInst *KernArgSegment =
Builder.CreateIntrinsic(Intrinsic::amdgcn_kernarg_segment_ptr, {}, {},
nullptr, F.getName() + ".kernarg.segment");

KernArgSegment->addRetAttr(Attribute::NonNull);
KernArgSegment->addRetAttr(
Attribute::getWithDereferenceableBytes(Ctx, TotalKernArgSize));

Expand Down
Loading

0 comments on commit 584c34a

Please sign in to comment.