-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU/GlobalISel: Add skeletons for new register bank select passes #112862
base: users/petar-avramovic/new-rbs-revert-amdgpu-regbankselect
Are you sure you want to change the base?
AMDGPU/GlobalISel: Add skeletons for new register bank select passes #112862
Conversation
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @petar-avramovic and the rest of your teammates on Graphite |
@llvm/pr-subscribers-backend-amdgpu Author: Petar Avramovic (petar-avramovic) ChangesNew register bank select for AMDGPU will be split in two passes: Summary of new features:
Patch is 190.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112862.diff 11 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 342d55e828bca5..1f5f22f8e7d8a0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -39,6 +39,8 @@ FunctionPass *createSIFoldOperandsLegacyPass();
FunctionPass *createSIPeepholeSDWALegacyPass();
FunctionPass *createSILowerI1CopiesLegacyPass();
FunctionPass *createAMDGPUGlobalISelDivergenceLoweringPass();
+FunctionPass *createAMDGPURBSelectPass();
+FunctionPass *createAMDGPURBLegalizePass();
FunctionPass *createSIShrinkInstructionsLegacyPass();
FunctionPass *createSILoadStoreOptimizerLegacyPass();
FunctionPass *createSIWholeQuadModePass();
@@ -188,6 +190,12 @@ extern char &SILowerI1CopiesLegacyID;
void initializeAMDGPUGlobalISelDivergenceLoweringPass(PassRegistry &);
extern char &AMDGPUGlobalISelDivergenceLoweringID;
+void initializeAMDGPURBSelectPass(PassRegistry &);
+extern char &AMDGPURBSelectID;
+
+void initializeAMDGPURBLegalizePass(PassRegistry &);
+extern char &AMDGPURBLegalizeID;
+
void initializeAMDGPUMarkLastScratchLoadPass(PassRegistry &);
extern char &AMDGPUMarkLastScratchLoadID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp
new file mode 100644
index 00000000000000..9a9722559377f6
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp
@@ -0,0 +1,74 @@
+//===-- AMDGPURBLegalize.cpp ----------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// Lower G_ instructions that can't be inst-selected with register bank
+/// assignment given by RB-select based on machine uniformity info.
+/// Given types on all operands, some register bank assignments require lowering
+/// while other do not.
+/// Note: cases where all register bank assignments would require lowering are
+/// lowered in legalizer.
+/// For example vgpr S64 G_AND requires lowering to S32 while SGPR S64 does not.
+/// Eliminate sgpr S1 by lowering to sgpr S32.
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/InitializePasses.h"
+
+#define DEBUG_TYPE "rb-legalize"
+
+using namespace llvm;
+
+namespace {
+
+class AMDGPURBLegalize : public MachineFunctionPass {
+public:
+ static char ID;
+
+public:
+ AMDGPURBLegalize() : MachineFunctionPass(ID) {
+ initializeAMDGPURBLegalizePass(*PassRegistry::getPassRegistry());
+ }
+
+ bool runOnMachineFunction(MachineFunction &MF) override;
+
+ StringRef getPassName() const override { return "AMDGPU RB Legalize"; }
+
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ MachineFunctionPass::getAnalysisUsage(AU);
+ }
+
+ // If there were no phis and we do waterfall expansion machine verifier would
+ // fail.
+ MachineFunctionProperties getClearedProperties() const override {
+ return MachineFunctionProperties().set(
+ MachineFunctionProperties::Property::NoPHIs);
+ }
+};
+
+} // End anonymous namespace.
+
+INITIALIZE_PASS_BEGIN(AMDGPURBLegalize, DEBUG_TYPE, "AMDGPU RB Legalize", false,
+ false)
+INITIALIZE_PASS_END(AMDGPURBLegalize, DEBUG_TYPE, "AMDGPU RB Legalize", false,
+ false)
+
+char AMDGPURBLegalize::ID = 0;
+
+char &llvm::AMDGPURBLegalizeID = AMDGPURBLegalize::ID;
+
+FunctionPass *llvm::createAMDGPURBLegalizePass() {
+ return new AMDGPURBLegalize();
+}
+
+using namespace AMDGPU;
+
+bool AMDGPURBLegalize::runOnMachineFunction(MachineFunction &MF) {
+ return true;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp b/llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp
new file mode 100644
index 00000000000000..c53a68ff72a8ad
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp
@@ -0,0 +1,66 @@
+//===-- AMDGPURBSelect.cpp ------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// Assign register banks to all register operands of G_ instructions using
+/// machine uniformity analysis.
+/// SGPR - uniform values and some lane masks
+/// VGPR - divergent, non S1, values
+/// VCC - divergent S1 values(lane masks)
+/// However in some cases G_ instructions with this register bank assignment
+/// can't be inst-selected. This is solved in RBLegalize.
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/InitializePasses.h"
+
+#define DEBUG_TYPE "rb-select"
+
+using namespace llvm;
+
+namespace {
+
+class AMDGPURBSelect : public MachineFunctionPass {
+public:
+ static char ID;
+
+public:
+ AMDGPURBSelect() : MachineFunctionPass(ID) {
+ initializeAMDGPURBSelectPass(*PassRegistry::getPassRegistry());
+ }
+
+ bool runOnMachineFunction(MachineFunction &MF) override;
+
+ StringRef getPassName() const override { return "AMDGPU RB select"; }
+
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ MachineFunctionPass::getAnalysisUsage(AU);
+ }
+
+ // This pass assigns register banks to all virtual registers, and we maintain
+ // this property in subsequent passes
+ MachineFunctionProperties getSetProperties() const override {
+ return MachineFunctionProperties().set(
+ MachineFunctionProperties::Property::RegBankSelected);
+ }
+};
+
+} // End anonymous namespace.
+
+INITIALIZE_PASS_BEGIN(AMDGPURBSelect, DEBUG_TYPE, "AMDGPU RB select", false,
+ false)
+INITIALIZE_PASS_END(AMDGPURBSelect, DEBUG_TYPE, "AMDGPU RB select", false,
+ false)
+
+char AMDGPURBSelect::ID = 0;
+
+char &llvm::AMDGPURBSelectID = AMDGPURBSelect::ID;
+
+FunctionPass *llvm::createAMDGPURBSelectPass() { return new AMDGPURBSelect(); }
+
+bool AMDGPURBSelect::runOnMachineFunction(MachineFunction &MF) { return true; }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index e4cc522194f2a9..c77e6b05395acc 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -443,6 +443,11 @@ static cl::opt<bool>
cl::desc("Enable AMDGPUAttributorPass"),
cl::init(true), cl::Hidden);
+static cl::opt<bool> NewRegBankSelect(
+ "new-reg-bank-select",
+ cl::desc("Run rb-select and rb-legalize instead of amdgpu-regbankselect"),
+ cl::init(false), cl::Hidden);
+
extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
// Register the target
RegisterTargetMachine<R600TargetMachine> X(getTheR600Target());
@@ -459,6 +464,8 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeGCNDPPCombineLegacyPass(*PR);
initializeSILowerI1CopiesLegacyPass(*PR);
initializeAMDGPUGlobalISelDivergenceLoweringPass(*PR);
+ initializeAMDGPURBSelectPass(*PR);
+ initializeAMDGPURBLegalizePass(*PR);
initializeSILowerWWMCopiesPass(*PR);
initializeAMDGPUMarkLastScratchLoadPass(*PR);
initializeSILowerSGPRSpillsLegacyPass(*PR);
@@ -1371,7 +1378,12 @@ void GCNPassConfig::addPreRegBankSelect() {
}
bool GCNPassConfig::addRegBankSelect() {
- addPass(new AMDGPURegBankSelect());
+ if (NewRegBankSelect) {
+ addPass(createAMDGPURBSelectPass());
+ addPass(createAMDGPURBLegalizePass());
+ } else {
+ addPass(new AMDGPURegBankSelect());
+ }
return false;
}
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index fed29c3e14aae2..019ec108670fa8 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -57,6 +57,8 @@ add_llvm_target(AMDGPUCodeGen
AMDGPUExportClustering.cpp
AMDGPUFrameLowering.cpp
AMDGPUGlobalISelDivergenceLowering.cpp
+ AMDGPURBSelect.cpp
+ AMDGPURBLegalize.cpp
AMDGPUGlobalISelUtils.cpp
AMDGPUHSAMetadataStreamer.cpp
AMDGPUInsertDelayAlu.cpp
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir
new file mode 100644
index 00000000000000..880057813adf54
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir
@@ -0,0 +1,858 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -run-pass=none %s -verify-machineinstrs -o - | FileCheck %s
+
+---
+name: uniform_in_vgpr
+legalized: true
+body: |
+ bb.1:
+ liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
+
+ ; CHECK-LABEL: name: uniform_in_vgpr
+ ; CHECK: liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
+ ; CHECK-NEXT: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[COPY]](s32)
+ ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[COPY1]]
+ ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %0:_(s32) = COPY $sgpr0
+ %1:_(s32) = COPY $sgpr1
+ %3:_(s32) = COPY $vgpr0
+ %4:_(s32) = COPY $vgpr1
+ %2:_(p1) = G_MERGE_VALUES %3(s32), %4(s32)
+ %6:_(s32) = G_FPTOUI %0(s32)
+ %7:_(s32) = G_ADD %6, %1
+ G_STORE %7(s32), %2(p1) :: (store (s32), addrspace 1)
+ S_ENDPGM 0
+...
+
+---
+name: back_to_back_uniform_in_vgpr
+legalized: true
+body: |
+ bb.1:
+ liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr0, $vgpr1
+
+ ; CHECK-LABEL: name: back_to_back_uniform_in_vgpr
+ ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr2
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY3]](s32), [[COPY4]](s32)
+ ; CHECK-NEXT: [[FADD:%[0-9]+]]:_(s32) = G_FADD [[COPY]], [[COPY1]]
+ ; CHECK-NEXT: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FADD]](s32)
+ ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[COPY2]]
+ ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %0:_(s32) = COPY $sgpr0
+ %1:_(s32) = COPY $sgpr1
+ %2:_(s32) = COPY $sgpr2
+ %4:_(s32) = COPY $vgpr0
+ %5:_(s32) = COPY $vgpr1
+ %3:_(p1) = G_MERGE_VALUES %4(s32), %5(s32)
+ %7:_(s32) = G_FADD %0, %1
+ %8:_(s32) = G_FPTOUI %7(s32)
+ %9:_(s32) = G_ADD %8, %2
+ G_STORE %9(s32), %3(p1) :: (store (s32), addrspace 1)
+ S_ENDPGM 0
+...
+
+---
+name: buffer_load_uniform
+legalized: true
+body: |
+ bb.1:
+ liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $vgpr0, $vgpr1
+
+ ; CHECK-LABEL: name: buffer_load_uniform
+ ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr2
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr3
+ ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32)
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr4
+ ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY6:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY5]](s32), [[COPY6]](s32)
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; CHECK-NEXT: [[AMDGPU_BUFFER_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD [[BUILD_VECTOR]](<4 x s32>), [[C]](s32), [[COPY4]], [[C]], 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+ ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_BUFFER_LOAD]](<4 x s32>)
+ ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[C1]]
+ ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %3:_(s32) = COPY $sgpr0
+ %4:_(s32) = COPY $sgpr1
+ %5:_(s32) = COPY $sgpr2
+ %6:_(s32) = COPY $sgpr3
+ %0:_(<4 x s32>) = G_BUILD_VECTOR %3(s32), %4(s32), %5(s32), %6(s32)
+ %1:_(s32) = COPY $sgpr4
+ %7:_(s32) = COPY $vgpr0
+ %8:_(s32) = COPY $vgpr1
+ %2:_(p1) = G_MERGE_VALUES %7(s32), %8(s32)
+ %11:_(s32) = G_CONSTANT i32 0
+ %10:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD %0(<4 x s32>), %11(s32), %1, %11, 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+ %13:_(s32) = G_CONSTANT i32 1
+ %15:_(s32), %16:_(s32), %17:_(s32), %18:_(s32) = G_UNMERGE_VALUES %10(<4 x s32>)
+ %14:_(s32) = G_ADD %16, %13
+ G_STORE %14(s32), %2(p1) :: (store (s32), addrspace 1)
+ S_ENDPGM 0
+...
+
+---
+name: buffer_load_divergent
+legalized: true
+body: |
+ bb.1:
+ liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $vgpr0, $vgpr1, $vgpr2
+
+ ; CHECK-LABEL: name: buffer_load_divergent
+ ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $vgpr0, $vgpr1, $vgpr2
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr2
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr3
+ ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32)
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[COPY6:%[0-9]+]]:_(s32) = COPY $vgpr2
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY5]](s32), [[COPY6]](s32)
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; CHECK-NEXT: [[AMDGPU_BUFFER_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD [[BUILD_VECTOR]](<4 x s32>), [[C]](s32), [[COPY4]], [[C]], 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+ ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_BUFFER_LOAD]](<4 x s32>)
+ ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[C1]]
+ ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %3:_(s32) = COPY $sgpr0
+ %4:_(s32) = COPY $sgpr1
+ %5:_(s32) = COPY $sgpr2
+ %6:_(s32) = COPY $sgpr3
+ %0:_(<4 x s32>) = G_BUILD_VECTOR %3(s32), %4(s32), %5(s32), %6(s32)
+ %1:_(s32) = COPY $vgpr0
+ %7:_(s32) = COPY $vgpr1
+ %8:_(s32) = COPY $vgpr2
+ %2:_(p1) = G_MERGE_VALUES %7(s32), %8(s32)
+ %11:_(s32) = G_CONSTANT i32 0
+ %10:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD %0(<4 x s32>), %11(s32), %1, %11, 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+ %13:_(s32) = G_CONSTANT i32 1
+ %15:_(s32), %16:_(s32), %17:_(s32), %18:_(s32) = G_UNMERGE_VALUES %10(<4 x s32>)
+ %14:_(s32) = G_ADD %16, %13
+ G_STORE %14(s32), %2(p1) :: (store (s32), addrspace 1)
+ S_ENDPGM 0
+...
+
+---
+name: vgpr_and_i64
+legalized: true
+body: |
+ bb.1:
+ liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+ ; CHECK-LABEL: name: vgpr_and_i64
+ ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
+ ; CHECK-NEXT: [[MV1:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr4
+ ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr5
+ ; CHECK-NEXT: [[MV2:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY4]](s32), [[COPY5]](s32)
+ ; CHECK-NEXT: [[AND:%[0-9]+]]:_(s64) = G_AND [[MV]], [[MV1]]
+ ; CHECK-NEXT: G_STORE [[AND]](s64), [[MV2]](p1) :: (store (s64), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %3:_(s32) = COPY $vgpr0
+ %4:_(s32) = COPY $vgpr1
+ %0:_(s64) = G_MERGE_VALUES %3(s32), %4(s32)
+ %5:_(s32) = COPY $vgpr2
+ %6:_(s32) = COPY $vgpr3
+ %1:_(s64) = G_MERGE_VALUES %5(s32), %6(s32)
+ %7:_(s32) = COPY $vgpr4
+ %8:_(s32) = COPY $vgpr5
+ %2:_(p1) = G_MERGE_VALUES %7(s32), %8(s32)
+ %10:_(s64) = G_AND %0, %1
+ G_STORE %10(s64), %2(p1) :: (store (s64), addrspace 1)
+ S_ENDPGM 0
+...
+
+---
+name: abs_sgpr_i16
+legalized: true
+body: |
+ bb.1:
+ liveins: $sgpr0, $vgpr0, $vgpr1
+
+ ; CHECK-LABEL: name: abs_sgpr_i16
+ ; CHECK: liveins: $sgpr0, $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+ ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY1]](s32), [[COPY2]](s32)
+ ; CHECK-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]]
+ ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16)
+ ; CHECK-NEXT: G_STORE [[ANYEXT]](s32), [[MV]](p1) :: (store (s16), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %2:_(s32) = COPY $sgpr0
+ %0:_(s16) = G_TRUNC %2(s32)
+ %3:_(s32) = COPY $vgpr0
+ %4:_(s32) = COPY $vgpr1
+ %1:_(p1) = G_MERGE_VALUES %3(s32), %4(s32)
+ %6:_(s16) = G_ABS %0
+ %7:_(s32) = G_ANYEXT %6(s16)
+ G_STORE %7(s32), %1(p1) :: (store (s16), addrspace 1)
+ S_ENDPGM 0
+...
+
+---
+name: uniform_i1_phi
+legalized: true
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: uniform_i1_phi
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x30000000), %bb.2(0x50000000)
+ ; CHECK-NEXT: liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr0
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr1
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 6
+ ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY2]](s32), [[C]]
+ ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; CHECK-NEXT: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[COPY3]](s32), [[C1]]
+ ; CHECK-NEXT: G_BRCOND [[ICMP1]](s1), %bb.2
+ ; CHECK-NEXT: G_BR %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; CHECK-NEXT: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(ult), [[COPY2]](s32), [[C2]]
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: [[PHI:%[0-9]+]]:_(s1) = G_PHI [[ICMP]](s1), %bb.0, [[ICMP2]](s1), %bb.1
+ ; CHECK-NEXT: [[SEXT:%[0-9]+]]:_(s32) = G_SEXT [[PHI]](s1)
+ ; CHECK-NEXT: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 2
+ ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[SEXT]], [[C3]]
+ ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.1:
+ successors: %bb.2(0x30000000), %bb.3(0x50000000)
+ liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
...
[truncated]
|
@llvm/pr-subscribers-llvm-globalisel Author: Petar Avramovic (petar-avramovic) ChangesNew register bank select for AMDGPU will be split in two passes: Summary of new features:
Patch is 190.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112862.diff 11 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 342d55e828bca5..1f5f22f8e7d8a0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -39,6 +39,8 @@ FunctionPass *createSIFoldOperandsLegacyPass();
FunctionPass *createSIPeepholeSDWALegacyPass();
FunctionPass *createSILowerI1CopiesLegacyPass();
FunctionPass *createAMDGPUGlobalISelDivergenceLoweringPass();
+FunctionPass *createAMDGPURBSelectPass();
+FunctionPass *createAMDGPURBLegalizePass();
FunctionPass *createSIShrinkInstructionsLegacyPass();
FunctionPass *createSILoadStoreOptimizerLegacyPass();
FunctionPass *createSIWholeQuadModePass();
@@ -188,6 +190,12 @@ extern char &SILowerI1CopiesLegacyID;
void initializeAMDGPUGlobalISelDivergenceLoweringPass(PassRegistry &);
extern char &AMDGPUGlobalISelDivergenceLoweringID;
+void initializeAMDGPURBSelectPass(PassRegistry &);
+extern char &AMDGPURBSelectID;
+
+void initializeAMDGPURBLegalizePass(PassRegistry &);
+extern char &AMDGPURBLegalizeID;
+
void initializeAMDGPUMarkLastScratchLoadPass(PassRegistry &);
extern char &AMDGPUMarkLastScratchLoadID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp
new file mode 100644
index 00000000000000..9a9722559377f6
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp
@@ -0,0 +1,74 @@
+//===-- AMDGPURBLegalize.cpp ----------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// Lower G_ instructions that can't be inst-selected with register bank
+/// assignment given by RB-select based on machine uniformity info.
+/// Given types on all operands, some register bank assignments require lowering
+/// while other do not.
+/// Note: cases where all register bank assignments would require lowering are
+/// lowered in legalizer.
+/// For example vgpr S64 G_AND requires lowering to S32 while SGPR S64 does not.
+/// Eliminate sgpr S1 by lowering to sgpr S32.
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/InitializePasses.h"
+
+#define DEBUG_TYPE "rb-legalize"
+
+using namespace llvm;
+
+namespace {
+
+class AMDGPURBLegalize : public MachineFunctionPass {
+public:
+ static char ID;
+
+public:
+ AMDGPURBLegalize() : MachineFunctionPass(ID) {
+ initializeAMDGPURBLegalizePass(*PassRegistry::getPassRegistry());
+ }
+
+ bool runOnMachineFunction(MachineFunction &MF) override;
+
+ StringRef getPassName() const override { return "AMDGPU RB Legalize"; }
+
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ MachineFunctionPass::getAnalysisUsage(AU);
+ }
+
+ // If there were no phis and we do waterfall expansion machine verifier would
+ // fail.
+ MachineFunctionProperties getClearedProperties() const override {
+ return MachineFunctionProperties().set(
+ MachineFunctionProperties::Property::NoPHIs);
+ }
+};
+
+} // End anonymous namespace.
+
+INITIALIZE_PASS_BEGIN(AMDGPURBLegalize, DEBUG_TYPE, "AMDGPU RB Legalize", false,
+ false)
+INITIALIZE_PASS_END(AMDGPURBLegalize, DEBUG_TYPE, "AMDGPU RB Legalize", false,
+ false)
+
+char AMDGPURBLegalize::ID = 0;
+
+char &llvm::AMDGPURBLegalizeID = AMDGPURBLegalize::ID;
+
+FunctionPass *llvm::createAMDGPURBLegalizePass() {
+ return new AMDGPURBLegalize();
+}
+
+using namespace AMDGPU;
+
+bool AMDGPURBLegalize::runOnMachineFunction(MachineFunction &MF) {
+ return true;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp b/llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp
new file mode 100644
index 00000000000000..c53a68ff72a8ad
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp
@@ -0,0 +1,66 @@
+//===-- AMDGPURBSelect.cpp ------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// Assign register banks to all register operands of G_ instructions using
+/// machine uniformity analysis.
+/// SGPR - uniform values and some lane masks
+/// VGPR - divergent, non S1, values
+/// VCC - divergent S1 values(lane masks)
+/// However in some cases G_ instructions with this register bank assignment
+/// can't be inst-selected. This is solved in RBLegalize.
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/InitializePasses.h"
+
+#define DEBUG_TYPE "rb-select"
+
+using namespace llvm;
+
+namespace {
+
+class AMDGPURBSelect : public MachineFunctionPass {
+public:
+ static char ID;
+
+public:
+ AMDGPURBSelect() : MachineFunctionPass(ID) {
+ initializeAMDGPURBSelectPass(*PassRegistry::getPassRegistry());
+ }
+
+ bool runOnMachineFunction(MachineFunction &MF) override;
+
+ StringRef getPassName() const override { return "AMDGPU RB select"; }
+
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ MachineFunctionPass::getAnalysisUsage(AU);
+ }
+
+ // This pass assigns register banks to all virtual registers, and we maintain
+ // this property in subsequent passes
+ MachineFunctionProperties getSetProperties() const override {
+ return MachineFunctionProperties().set(
+ MachineFunctionProperties::Property::RegBankSelected);
+ }
+};
+
+} // End anonymous namespace.
+
+INITIALIZE_PASS_BEGIN(AMDGPURBSelect, DEBUG_TYPE, "AMDGPU RB select", false,
+ false)
+INITIALIZE_PASS_END(AMDGPURBSelect, DEBUG_TYPE, "AMDGPU RB select", false,
+ false)
+
+char AMDGPURBSelect::ID = 0;
+
+char &llvm::AMDGPURBSelectID = AMDGPURBSelect::ID;
+
+FunctionPass *llvm::createAMDGPURBSelectPass() { return new AMDGPURBSelect(); }
+
+bool AMDGPURBSelect::runOnMachineFunction(MachineFunction &MF) { return true; }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index e4cc522194f2a9..c77e6b05395acc 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -443,6 +443,11 @@ static cl::opt<bool>
cl::desc("Enable AMDGPUAttributorPass"),
cl::init(true), cl::Hidden);
+static cl::opt<bool> NewRegBankSelect(
+ "new-reg-bank-select",
+ cl::desc("Run rb-select and rb-legalize instead of amdgpu-regbankselect"),
+ cl::init(false), cl::Hidden);
+
extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
// Register the target
RegisterTargetMachine<R600TargetMachine> X(getTheR600Target());
@@ -459,6 +464,8 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeGCNDPPCombineLegacyPass(*PR);
initializeSILowerI1CopiesLegacyPass(*PR);
initializeAMDGPUGlobalISelDivergenceLoweringPass(*PR);
+ initializeAMDGPURBSelectPass(*PR);
+ initializeAMDGPURBLegalizePass(*PR);
initializeSILowerWWMCopiesPass(*PR);
initializeAMDGPUMarkLastScratchLoadPass(*PR);
initializeSILowerSGPRSpillsLegacyPass(*PR);
@@ -1371,7 +1378,12 @@ void GCNPassConfig::addPreRegBankSelect() {
}
bool GCNPassConfig::addRegBankSelect() {
- addPass(new AMDGPURegBankSelect());
+ if (NewRegBankSelect) {
+ addPass(createAMDGPURBSelectPass());
+ addPass(createAMDGPURBLegalizePass());
+ } else {
+ addPass(new AMDGPURegBankSelect());
+ }
return false;
}
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index fed29c3e14aae2..019ec108670fa8 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -57,6 +57,8 @@ add_llvm_target(AMDGPUCodeGen
AMDGPUExportClustering.cpp
AMDGPUFrameLowering.cpp
AMDGPUGlobalISelDivergenceLowering.cpp
+ AMDGPURBSelect.cpp
+ AMDGPURBLegalize.cpp
AMDGPUGlobalISelUtils.cpp
AMDGPUHSAMetadataStreamer.cpp
AMDGPUInsertDelayAlu.cpp
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir
new file mode 100644
index 00000000000000..880057813adf54
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir
@@ -0,0 +1,858 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -run-pass=none %s -verify-machineinstrs -o - | FileCheck %s
+
+---
+name: uniform_in_vgpr
+legalized: true
+body: |
+ bb.1:
+ liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
+
+ ; CHECK-LABEL: name: uniform_in_vgpr
+ ; CHECK: liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
+ ; CHECK-NEXT: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[COPY]](s32)
+ ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[COPY1]]
+ ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %0:_(s32) = COPY $sgpr0
+ %1:_(s32) = COPY $sgpr1
+ %3:_(s32) = COPY $vgpr0
+ %4:_(s32) = COPY $vgpr1
+ %2:_(p1) = G_MERGE_VALUES %3(s32), %4(s32)
+ %6:_(s32) = G_FPTOUI %0(s32)
+ %7:_(s32) = G_ADD %6, %1
+ G_STORE %7(s32), %2(p1) :: (store (s32), addrspace 1)
+ S_ENDPGM 0
+...
+
+---
+name: back_to_back_uniform_in_vgpr
+legalized: true
+body: |
+ bb.1:
+ liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr0, $vgpr1
+
+ ; CHECK-LABEL: name: back_to_back_uniform_in_vgpr
+ ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr2
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY3]](s32), [[COPY4]](s32)
+ ; CHECK-NEXT: [[FADD:%[0-9]+]]:_(s32) = G_FADD [[COPY]], [[COPY1]]
+ ; CHECK-NEXT: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FADD]](s32)
+ ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[COPY2]]
+ ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %0:_(s32) = COPY $sgpr0
+ %1:_(s32) = COPY $sgpr1
+ %2:_(s32) = COPY $sgpr2
+ %4:_(s32) = COPY $vgpr0
+ %5:_(s32) = COPY $vgpr1
+ %3:_(p1) = G_MERGE_VALUES %4(s32), %5(s32)
+ %7:_(s32) = G_FADD %0, %1
+ %8:_(s32) = G_FPTOUI %7(s32)
+ %9:_(s32) = G_ADD %8, %2
+ G_STORE %9(s32), %3(p1) :: (store (s32), addrspace 1)
+ S_ENDPGM 0
+...
+
+---
+name: buffer_load_uniform
+legalized: true
+body: |
+ bb.1:
+ liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $vgpr0, $vgpr1
+
+ ; CHECK-LABEL: name: buffer_load_uniform
+ ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr2
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr3
+ ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32)
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr4
+ ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY6:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY5]](s32), [[COPY6]](s32)
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; CHECK-NEXT: [[AMDGPU_BUFFER_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD [[BUILD_VECTOR]](<4 x s32>), [[C]](s32), [[COPY4]], [[C]], 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+ ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_BUFFER_LOAD]](<4 x s32>)
+ ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[C1]]
+ ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %3:_(s32) = COPY $sgpr0
+ %4:_(s32) = COPY $sgpr1
+ %5:_(s32) = COPY $sgpr2
+ %6:_(s32) = COPY $sgpr3
+ %0:_(<4 x s32>) = G_BUILD_VECTOR %3(s32), %4(s32), %5(s32), %6(s32)
+ %1:_(s32) = COPY $sgpr4
+ %7:_(s32) = COPY $vgpr0
+ %8:_(s32) = COPY $vgpr1
+ %2:_(p1) = G_MERGE_VALUES %7(s32), %8(s32)
+ %11:_(s32) = G_CONSTANT i32 0
+ %10:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD %0(<4 x s32>), %11(s32), %1, %11, 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+ %13:_(s32) = G_CONSTANT i32 1
+ %15:_(s32), %16:_(s32), %17:_(s32), %18:_(s32) = G_UNMERGE_VALUES %10(<4 x s32>)
+ %14:_(s32) = G_ADD %16, %13
+ G_STORE %14(s32), %2(p1) :: (store (s32), addrspace 1)
+ S_ENDPGM 0
+...
+
+---
+name: buffer_load_divergent
+legalized: true
+body: |
+ bb.1:
+ liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $vgpr0, $vgpr1, $vgpr2
+
+ ; CHECK-LABEL: name: buffer_load_divergent
+ ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $vgpr0, $vgpr1, $vgpr2
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr2
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr3
+ ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32)
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[COPY6:%[0-9]+]]:_(s32) = COPY $vgpr2
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY5]](s32), [[COPY6]](s32)
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; CHECK-NEXT: [[AMDGPU_BUFFER_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD [[BUILD_VECTOR]](<4 x s32>), [[C]](s32), [[COPY4]], [[C]], 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+ ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_BUFFER_LOAD]](<4 x s32>)
+ ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[C1]]
+ ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %3:_(s32) = COPY $sgpr0
+ %4:_(s32) = COPY $sgpr1
+ %5:_(s32) = COPY $sgpr2
+ %6:_(s32) = COPY $sgpr3
+ %0:_(<4 x s32>) = G_BUILD_VECTOR %3(s32), %4(s32), %5(s32), %6(s32)
+ %1:_(s32) = COPY $vgpr0
+ %7:_(s32) = COPY $vgpr1
+ %8:_(s32) = COPY $vgpr2
+ %2:_(p1) = G_MERGE_VALUES %7(s32), %8(s32)
+ %11:_(s32) = G_CONSTANT i32 0
+ %10:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD %0(<4 x s32>), %11(s32), %1, %11, 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+ %13:_(s32) = G_CONSTANT i32 1
+ %15:_(s32), %16:_(s32), %17:_(s32), %18:_(s32) = G_UNMERGE_VALUES %10(<4 x s32>)
+ %14:_(s32) = G_ADD %16, %13
+ G_STORE %14(s32), %2(p1) :: (store (s32), addrspace 1)
+ S_ENDPGM 0
+...
+
+---
+name: vgpr_and_i64
+legalized: true
+body: |
+ bb.1:
+ liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+ ; CHECK-LABEL: name: vgpr_and_i64
+ ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
+ ; CHECK-NEXT: [[MV1:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr4
+ ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr5
+ ; CHECK-NEXT: [[MV2:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY4]](s32), [[COPY5]](s32)
+ ; CHECK-NEXT: [[AND:%[0-9]+]]:_(s64) = G_AND [[MV]], [[MV1]]
+ ; CHECK-NEXT: G_STORE [[AND]](s64), [[MV2]](p1) :: (store (s64), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %3:_(s32) = COPY $vgpr0
+ %4:_(s32) = COPY $vgpr1
+ %0:_(s64) = G_MERGE_VALUES %3(s32), %4(s32)
+ %5:_(s32) = COPY $vgpr2
+ %6:_(s32) = COPY $vgpr3
+ %1:_(s64) = G_MERGE_VALUES %5(s32), %6(s32)
+ %7:_(s32) = COPY $vgpr4
+ %8:_(s32) = COPY $vgpr5
+ %2:_(p1) = G_MERGE_VALUES %7(s32), %8(s32)
+ %10:_(s64) = G_AND %0, %1
+ G_STORE %10(s64), %2(p1) :: (store (s64), addrspace 1)
+ S_ENDPGM 0
+...
+
+---
+name: abs_sgpr_i16
+legalized: true
+body: |
+ bb.1:
+ liveins: $sgpr0, $vgpr0, $vgpr1
+
+ ; CHECK-LABEL: name: abs_sgpr_i16
+ ; CHECK: liveins: $sgpr0, $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+ ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY1]](s32), [[COPY2]](s32)
+ ; CHECK-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]]
+ ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16)
+ ; CHECK-NEXT: G_STORE [[ANYEXT]](s32), [[MV]](p1) :: (store (s16), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %2:_(s32) = COPY $sgpr0
+ %0:_(s16) = G_TRUNC %2(s32)
+ %3:_(s32) = COPY $vgpr0
+ %4:_(s32) = COPY $vgpr1
+ %1:_(p1) = G_MERGE_VALUES %3(s32), %4(s32)
+ %6:_(s16) = G_ABS %0
+ %7:_(s32) = G_ANYEXT %6(s16)
+ G_STORE %7(s32), %1(p1) :: (store (s16), addrspace 1)
+ S_ENDPGM 0
+...
+
+---
+name: uniform_i1_phi
+legalized: true
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: uniform_i1_phi
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x30000000), %bb.2(0x50000000)
+ ; CHECK-NEXT: liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr0
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr1
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 6
+ ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY2]](s32), [[C]]
+ ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; CHECK-NEXT: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[COPY3]](s32), [[C1]]
+ ; CHECK-NEXT: G_BRCOND [[ICMP1]](s1), %bb.2
+ ; CHECK-NEXT: G_BR %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; CHECK-NEXT: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(ult), [[COPY2]](s32), [[C2]]
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: [[PHI:%[0-9]+]]:_(s1) = G_PHI [[ICMP]](s1), %bb.0, [[ICMP2]](s1), %bb.1
+ ; CHECK-NEXT: [[SEXT:%[0-9]+]]:_(s32) = G_SEXT [[PHI]](s1)
+ ; CHECK-NEXT: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 2
+ ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[SEXT]], [[C3]]
+ ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.1:
+ successors: %bb.2(0x30000000), %bb.3(0x50000000)
+ liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
...
[truncated]
|
llvm/lib/Target/AMDGPU/AMDGPU.h
Outdated
@@ -39,6 +39,8 @@ FunctionPass *createSIFoldOperandsLegacyPass(); | |||
FunctionPass *createSIPeepholeSDWALegacyPass(); | |||
FunctionPass *createSILowerI1CopiesLegacyPass(); | |||
FunctionPass *createAMDGPUGlobalISelDivergenceLoweringPass(); | |||
FunctionPass *createAMDGPURBSelectPass(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please spell out RegBank wherever possible, it makes it easier to grep for things in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will wait for more comments then bulk update everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any suggestions since AMDGPURegBankSelect is taken?
AMDGPURBSelect -> ...
AMDGPURBLegalize -> AMDGPURegBankLegalize is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk, I was hoping we could get rid of the other one :) Maybe we could call this one AMDGPUStandaloneRegBankSelect, since it's not inheriting from RegBankSelect (at least if that's the intention for the long term).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just delete the other one? I don't think it's doing anything now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AMDGPURegBankSelect is functional except for some functions with divergent control flow. Deleting it now breaks almost all globalisel regression tests.
When new regbankselect path gets functional we can make it a default path and decide what to do with AMDGPURegBankSelect, maybe rename it to legacy or just delete it.
New pass names as suggested:
AMDGPUStandaloneRegBankSelect
AMDGPURegBankLegalize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete the old one and replace it with the generic regbankselect pass. It's not doing anything in the pass other than running the analysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverting default reg-bank-select to RegBankSelect, new pass will take
AMDGPURegBankSelect
addPass(createAMDGPURBSelectPass()); | ||
addPass(createAMDGPURBLegalizePass()); | ||
} else { | ||
addPass(new AMDGPURegBankSelect()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just modify this AMDGPURegBankSelect instead of adding a new one? It was meant to use uniformity analysis too. Maybe you can propagate the flag into the pass if you want to keep the old behaviour around in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Older versions were doing that. New passes are much simpler and don't use anything from generic reg-bank-select(except register banks that come from td file via RegisterBankInfo).
To simplify review process and transition period I would prefer two passes separate from AMDGPURegBankSelect, rb-legalize will also be split across few files.
Old behavior is there to keep upcoming patches small by gradually switching to -new-reg-bank-select.
//===----------------------------------------------------------------------===// | ||
// | ||
/// Lower G_ instructions that can't be inst-selected with register bank | ||
/// assignment given by RB-select based on machine uniformity info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe mention explicitly that this is expected to run after RB-select? This might be surprising since the usual GISel pipeline has legalization first and then RB-select.
@@ -0,0 +1,858 @@ | |||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 | |||
# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -run-pass=none %s -verify-machineinstrs -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this run rb-legalize? It only returns true, so it should be pretty harmless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, will update mir test run lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, correction, not true. Machine verifier will complain that Generic virtual register must have a bank in a RegBankSelected function.
Now that I think of it, RBLegalize should probably also require that MachineFunction has Property::RegBankSelected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the point of this test is then. If it's testing the verifier, it should go in test/MachineVerifier (and then doesn't need -verify-machineinstrs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test is precommit to see something in diff when RBSelect and RBLegalize become functional instead of seeing new test with no changes.
8bf0a23
to
84284ba
Compare
73862a2
to
6f8e867
Compare
84284ba
to
3b0aaef
Compare
6f8e867
to
f6c230d
Compare
3b0aaef
to
07055a7
Compare
f6c230d
to
5fa4bae
Compare
New register bank select for AMDGPU will be split in two passes: - AMDGPURegBankSelect: select banks based on machine uniformity analysis - AMDGPURegBankLegalize: lower instructions that can't be inst-selected with register banks assigned by AMDGPURegBankSelect. AMDGPURegBankLegalize is similar to legalizer but with context of uniformity analysis. Does not change already assigned banks. Main goal of AMDGPURegBankLegalize is to provide high level table-like overview of how to lower generic instructions based on available target features and uniformity info (uniform vs divergent). See RegBankLegalizeRules. Summary of new features: At the moment register bank select assigns register bank to output register using simple algorithm: - one of the inputs is vgpr output is vgpr - all inputs are sgpr output is sgpr. When function does not contain divergent control flow propagating register banks like this works. In general, first point is still correct but second is not when function contains divergent control flow. Examples: - Phi with uniform inputs that go through divergent branch - Instruction with temporal divergent use. To fix this AMDGPURegBankSelect will use machine uniformity analysis to assign vgpr to each divergent and sgpr to each uniform instruction. But some instructions are only available on VALU (for example floating point instructions before gfx1150) and we need to assign vgpr to them. Since we are no longer propagating register banks we need to ensure that uniform instructions get their inputs in sgpr in some way. In AMDGPURegBankLegalize uniform instructions that are only available on VALU will be reassigned to vgpr on all operands and read-any-lane vgpr output to original sgpr output.
07055a7
to
623266f
Compare
New register bank select for AMDGPU will be split in two passes:
with register banks assigned by AMDGPURegBankSelect.
AMDGPURegBankLegalize is similar to legalizer but with context of
uniformity analysis. Does not change already assigned banks.
Main goal of AMDGPURegBankLegalize is to provide high level table-like
overview of how to lower generic instructions based on available target
features and uniformity info (uniform vs divergent).
See RegBankLegalizeRules.
Summary of new features:
At the moment register bank select assigns register bank to output
register using simple algorithm:
When function does not contain divergent control flow propagating
register banks like this works. In general, first point is still correct
but second is not when function contains divergent control flow.
Examples:
To fix this AMDGPURegBankSelect will use machine uniformity analysis
to assign vgpr to each divergent and sgpr to each uniform instruction.
But some instructions are only available on VALU (for example floating
point instructions before gfx1150) and we need to assign vgpr to them.
Since we are no longer propagating register banks we need to ensure that
uniform instructions get their inputs in sgpr in some way.
In AMDGPURegBankLegalize uniform instructions that are only available on
VALU will be reassigned to vgpr on all operands and read-any-lane vgpr
output to original sgpr output.