Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU/GlobalISel: Add skeletons for new register bank select passes #112862

Open
wants to merge 1 commit into
base: users/petar-avramovic/new-rbs-revert-amdgpu-regbankselect
Choose a base branch
from

Conversation

petar-avramovic
Copy link
Collaborator

@petar-avramovic petar-avramovic commented Oct 18, 2024

New register bank select for AMDGPU will be split in two passes:

  • AMDGPURegBankSelect: select banks based on machine uniformity analysis
  • AMDGPURegBankLegalize: lower instructions that can't be inst-selected
    with register banks assigned by AMDGPURegBankSelect.
    AMDGPURegBankLegalize is similar to legalizer but with context of
    uniformity analysis. Does not change already assigned banks.
    Main goal of AMDGPURegBankLegalize is to provide high level table-like
    overview of how to lower generic instructions based on available target
    features and uniformity info (uniform vs divergent).
    See RegBankLegalizeRules.

Summary of new features:
At the moment register bank select assigns register bank to output
register using simple algorithm:

  • one of the inputs is vgpr output is vgpr
  • all inputs are sgpr output is sgpr.
    When function does not contain divergent control flow propagating
    register banks like this works. In general, first point is still correct
    but second is not when function contains divergent control flow.
    Examples:
  • Phi with uniform inputs that go through divergent branch
  • Instruction with temporal divergent use.
    To fix this AMDGPURegBankSelect will use machine uniformity analysis
    to assign vgpr to each divergent and sgpr to each uniform instruction.
    But some instructions are only available on VALU (for example floating
    point instructions before gfx1150) and we need to assign vgpr to them.
    Since we are no longer propagating register banks we need to ensure that
    uniform instructions get their inputs in sgpr in some way.
    In AMDGPURegBankLegalize uniform instructions that are only available on
    VALU will be reassigned to vgpr on all operands and read-any-lane vgpr
    output to original sgpr output.

Copy link
Collaborator Author

petar-avramovic commented Oct 18, 2024

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @petar-avramovic and the rest of your teammates on Graphite Graphite

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 18, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Petar Avramovic (petar-avramovic)

Changes

New register bank select for AMDGPU will be split in two passes:
RBSelect: select banks based on machine uniformity analysis
RBLegalize: lower instructions that can't be inst-selected with register
banks assigned by RBSelect. Does not change already assigned banks.
Similar to legalizer but with context of uniformity analysis.
RBLegalize main goal is to provide high level table-like overview of how
to lower generic instructions based on available target features and
uniformity info (uniform vs divergent). See RegBankLegalizeRules.

Summary of new features:
At the moment reg bank select assigns register bank to output register
using simple algorithm:

  • one of the inputs is vgpr output is vgpr
  • all inputs are sgpr output is sgpr.
    When function does not contain divergent control flow propagating reg
    banks like this works. In general, first point is still correct but
    second is not when function contains divergent control flow. Examples:
  • Phi with uniform inputs that go through divergent branch
  • Instruction with temporal divergent use.
    To fix this RB-select will use machine uniformity analysis to assign
    vgpr to each divergent and sgpr to each uniform instruction.
    But some instructions are only available on VALU (for example floating
    point instructions before gfx1150) and we need to assign vgpr to them.
    Since we are no longer propagating reg banks we need to ensure that
    uniform instructions get their inputs in sgpr in some way.
    In RB-legalize uniform instructions that are only available on VALU will
    be reassigned to vgpr on all operands and readfirstlane vgpr output to
    original sgpr output.

Patch is 190.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112862.diff

11 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+8)
  • (added) llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp (+74)
  • (added) llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp (+66)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+13-1)
  • (modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+2)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir (+858)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-select.mir (+858)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-salu-float.ll (+54)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-salu-float.mir (+92)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui.ll (+635)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui.mir (+1377)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 342d55e828bca5..1f5f22f8e7d8a0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -39,6 +39,8 @@ FunctionPass *createSIFoldOperandsLegacyPass();
 FunctionPass *createSIPeepholeSDWALegacyPass();
 FunctionPass *createSILowerI1CopiesLegacyPass();
 FunctionPass *createAMDGPUGlobalISelDivergenceLoweringPass();
+FunctionPass *createAMDGPURBSelectPass();
+FunctionPass *createAMDGPURBLegalizePass();
 FunctionPass *createSIShrinkInstructionsLegacyPass();
 FunctionPass *createSILoadStoreOptimizerLegacyPass();
 FunctionPass *createSIWholeQuadModePass();
@@ -188,6 +190,12 @@ extern char &SILowerI1CopiesLegacyID;
 void initializeAMDGPUGlobalISelDivergenceLoweringPass(PassRegistry &);
 extern char &AMDGPUGlobalISelDivergenceLoweringID;
 
+void initializeAMDGPURBSelectPass(PassRegistry &);
+extern char &AMDGPURBSelectID;
+
+void initializeAMDGPURBLegalizePass(PassRegistry &);
+extern char &AMDGPURBLegalizeID;
+
 void initializeAMDGPUMarkLastScratchLoadPass(PassRegistry &);
 extern char &AMDGPUMarkLastScratchLoadID;
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp
new file mode 100644
index 00000000000000..9a9722559377f6
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp
@@ -0,0 +1,74 @@
+//===-- AMDGPURBLegalize.cpp ----------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// Lower G_ instructions that can't be inst-selected with register bank
+/// assignment given by RB-select based on machine uniformity info.
+/// Given types on all operands, some register bank assignments require lowering
+/// while other do not.
+/// Note: cases where all register bank assignments would require lowering are
+/// lowered in legalizer.
+/// For example vgpr S64 G_AND requires lowering to S32 while SGPR S64 does not.
+/// Eliminate sgpr S1 by lowering to sgpr S32.
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/InitializePasses.h"
+
+#define DEBUG_TYPE "rb-legalize"
+
+using namespace llvm;
+
+namespace {
+
+class AMDGPURBLegalize : public MachineFunctionPass {
+public:
+  static char ID;
+
+public:
+  AMDGPURBLegalize() : MachineFunctionPass(ID) {
+    initializeAMDGPURBLegalizePass(*PassRegistry::getPassRegistry());
+  }
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+
+  StringRef getPassName() const override { return "AMDGPU RB Legalize"; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+  // If there were no phis and we do waterfall expansion machine verifier would
+  // fail.
+  MachineFunctionProperties getClearedProperties() const override {
+    return MachineFunctionProperties().set(
+        MachineFunctionProperties::Property::NoPHIs);
+  }
+};
+
+} // End anonymous namespace.
+
+INITIALIZE_PASS_BEGIN(AMDGPURBLegalize, DEBUG_TYPE, "AMDGPU RB Legalize", false,
+                      false)
+INITIALIZE_PASS_END(AMDGPURBLegalize, DEBUG_TYPE, "AMDGPU RB Legalize", false,
+                    false)
+
+char AMDGPURBLegalize::ID = 0;
+
+char &llvm::AMDGPURBLegalizeID = AMDGPURBLegalize::ID;
+
+FunctionPass *llvm::createAMDGPURBLegalizePass() {
+  return new AMDGPURBLegalize();
+}
+
+using namespace AMDGPU;
+
+bool AMDGPURBLegalize::runOnMachineFunction(MachineFunction &MF) {
+  return true;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp b/llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp
new file mode 100644
index 00000000000000..c53a68ff72a8ad
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp
@@ -0,0 +1,66 @@
+//===-- AMDGPURBSelect.cpp ------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// Assign register banks to all register operands of G_ instructions using
+/// machine uniformity analysis.
+/// SGPR - uniform values and some lane masks
+/// VGPR - divergent, non S1, values
+/// VCC  - divergent S1 values(lane masks)
+/// However in some cases G_ instructions with this register bank assignment
+/// can't be inst-selected. This is solved in RBLegalize.
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/InitializePasses.h"
+
+#define DEBUG_TYPE "rb-select"
+
+using namespace llvm;
+
+namespace {
+
+class AMDGPURBSelect : public MachineFunctionPass {
+public:
+  static char ID;
+
+public:
+  AMDGPURBSelect() : MachineFunctionPass(ID) {
+    initializeAMDGPURBSelectPass(*PassRegistry::getPassRegistry());
+  }
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+
+  StringRef getPassName() const override { return "AMDGPU RB select"; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+  // This pass assigns register banks to all virtual registers, and we maintain
+  // this property in subsequent passes
+  MachineFunctionProperties getSetProperties() const override {
+    return MachineFunctionProperties().set(
+        MachineFunctionProperties::Property::RegBankSelected);
+  }
+};
+
+} // End anonymous namespace.
+
+INITIALIZE_PASS_BEGIN(AMDGPURBSelect, DEBUG_TYPE, "AMDGPU RB select", false,
+                      false)
+INITIALIZE_PASS_END(AMDGPURBSelect, DEBUG_TYPE, "AMDGPU RB select", false,
+                    false)
+
+char AMDGPURBSelect::ID = 0;
+
+char &llvm::AMDGPURBSelectID = AMDGPURBSelect::ID;
+
+FunctionPass *llvm::createAMDGPURBSelectPass() { return new AMDGPURBSelect(); }
+
+bool AMDGPURBSelect::runOnMachineFunction(MachineFunction &MF) { return true; }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index e4cc522194f2a9..c77e6b05395acc 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -443,6 +443,11 @@ static cl::opt<bool>
                            cl::desc("Enable AMDGPUAttributorPass"),
                            cl::init(true), cl::Hidden);
 
+static cl::opt<bool> NewRegBankSelect(
+    "new-reg-bank-select",
+    cl::desc("Run rb-select and rb-legalize instead of amdgpu-regbankselect"),
+    cl::init(false), cl::Hidden);
+
 extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
   // Register the target
   RegisterTargetMachine<R600TargetMachine> X(getTheR600Target());
@@ -459,6 +464,8 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
   initializeGCNDPPCombineLegacyPass(*PR);
   initializeSILowerI1CopiesLegacyPass(*PR);
   initializeAMDGPUGlobalISelDivergenceLoweringPass(*PR);
+  initializeAMDGPURBSelectPass(*PR);
+  initializeAMDGPURBLegalizePass(*PR);
   initializeSILowerWWMCopiesPass(*PR);
   initializeAMDGPUMarkLastScratchLoadPass(*PR);
   initializeSILowerSGPRSpillsLegacyPass(*PR);
@@ -1371,7 +1378,12 @@ void GCNPassConfig::addPreRegBankSelect() {
 }
 
 bool GCNPassConfig::addRegBankSelect() {
-  addPass(new AMDGPURegBankSelect());
+  if (NewRegBankSelect) {
+    addPass(createAMDGPURBSelectPass());
+    addPass(createAMDGPURBLegalizePass());
+  } else {
+    addPass(new AMDGPURegBankSelect());
+  }
   return false;
 }
 
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index fed29c3e14aae2..019ec108670fa8 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -57,6 +57,8 @@ add_llvm_target(AMDGPUCodeGen
   AMDGPUExportClustering.cpp
   AMDGPUFrameLowering.cpp
   AMDGPUGlobalISelDivergenceLowering.cpp
+  AMDGPURBSelect.cpp
+  AMDGPURBLegalize.cpp
   AMDGPUGlobalISelUtils.cpp
   AMDGPUHSAMetadataStreamer.cpp
   AMDGPUInsertDelayAlu.cpp
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir
new file mode 100644
index 00000000000000..880057813adf54
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir
@@ -0,0 +1,858 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -run-pass=none %s -verify-machineinstrs -o - | FileCheck %s
+
+---
+name: uniform_in_vgpr
+legalized: true
+body: |
+  bb.1:
+    liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
+
+    ; CHECK-LABEL: name: uniform_in_vgpr
+    ; CHECK: liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr1
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
+    ; CHECK-NEXT: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[COPY]](s32)
+    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[COPY1]]
+    ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+    ; CHECK-NEXT: S_ENDPGM 0
+    %0:_(s32) = COPY $sgpr0
+    %1:_(s32) = COPY $sgpr1
+    %3:_(s32) = COPY $vgpr0
+    %4:_(s32) = COPY $vgpr1
+    %2:_(p1) = G_MERGE_VALUES %3(s32), %4(s32)
+    %6:_(s32) = G_FPTOUI %0(s32)
+    %7:_(s32) = G_ADD %6, %1
+    G_STORE %7(s32), %2(p1) :: (store (s32), addrspace 1)
+    S_ENDPGM 0
+...
+
+---
+name: back_to_back_uniform_in_vgpr
+legalized: true
+body: |
+  bb.1:
+    liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr0, $vgpr1
+
+    ; CHECK-LABEL: name: back_to_back_uniform_in_vgpr
+    ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr0, $vgpr1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr2
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr1
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY3]](s32), [[COPY4]](s32)
+    ; CHECK-NEXT: [[FADD:%[0-9]+]]:_(s32) = G_FADD [[COPY]], [[COPY1]]
+    ; CHECK-NEXT: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FADD]](s32)
+    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[COPY2]]
+    ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+    ; CHECK-NEXT: S_ENDPGM 0
+    %0:_(s32) = COPY $sgpr0
+    %1:_(s32) = COPY $sgpr1
+    %2:_(s32) = COPY $sgpr2
+    %4:_(s32) = COPY $vgpr0
+    %5:_(s32) = COPY $vgpr1
+    %3:_(p1) = G_MERGE_VALUES %4(s32), %5(s32)
+    %7:_(s32) = G_FADD %0, %1
+    %8:_(s32) = G_FPTOUI %7(s32)
+    %9:_(s32) = G_ADD %8, %2
+    G_STORE %9(s32), %3(p1) :: (store (s32), addrspace 1)
+    S_ENDPGM 0
+...
+
+---
+name: buffer_load_uniform
+legalized: true
+body: |
+  bb.1:
+    liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $vgpr0, $vgpr1
+
+    ; CHECK-LABEL: name: buffer_load_uniform
+    ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $vgpr0, $vgpr1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr2
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr3
+    ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32)
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr4
+    ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[COPY6:%[0-9]+]]:_(s32) = COPY $vgpr1
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY5]](s32), [[COPY6]](s32)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; CHECK-NEXT: [[AMDGPU_BUFFER_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD [[BUILD_VECTOR]](<4 x s32>), [[C]](s32), [[COPY4]], [[C]], 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+    ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_BUFFER_LOAD]](<4 x s32>)
+    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[C1]]
+    ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+    ; CHECK-NEXT: S_ENDPGM 0
+    %3:_(s32) = COPY $sgpr0
+    %4:_(s32) = COPY $sgpr1
+    %5:_(s32) = COPY $sgpr2
+    %6:_(s32) = COPY $sgpr3
+    %0:_(<4 x s32>) = G_BUILD_VECTOR %3(s32), %4(s32), %5(s32), %6(s32)
+    %1:_(s32) = COPY $sgpr4
+    %7:_(s32) = COPY $vgpr0
+    %8:_(s32) = COPY $vgpr1
+    %2:_(p1) = G_MERGE_VALUES %7(s32), %8(s32)
+    %11:_(s32) = G_CONSTANT i32 0
+    %10:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD %0(<4 x s32>), %11(s32), %1, %11, 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+    %13:_(s32) = G_CONSTANT i32 1
+    %15:_(s32), %16:_(s32), %17:_(s32), %18:_(s32) = G_UNMERGE_VALUES %10(<4 x s32>)
+    %14:_(s32) = G_ADD %16, %13
+    G_STORE %14(s32), %2(p1) :: (store (s32), addrspace 1)
+    S_ENDPGM 0
+...
+
+---
+name: buffer_load_divergent
+legalized: true
+body: |
+  bb.1:
+    liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $vgpr0, $vgpr1, $vgpr2
+
+    ; CHECK-LABEL: name: buffer_load_divergent
+    ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $vgpr0, $vgpr1, $vgpr2
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr2
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr3
+    ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32)
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr1
+    ; CHECK-NEXT: [[COPY6:%[0-9]+]]:_(s32) = COPY $vgpr2
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY5]](s32), [[COPY6]](s32)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; CHECK-NEXT: [[AMDGPU_BUFFER_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD [[BUILD_VECTOR]](<4 x s32>), [[C]](s32), [[COPY4]], [[C]], 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+    ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_BUFFER_LOAD]](<4 x s32>)
+    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[C1]]
+    ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+    ; CHECK-NEXT: S_ENDPGM 0
+    %3:_(s32) = COPY $sgpr0
+    %4:_(s32) = COPY $sgpr1
+    %5:_(s32) = COPY $sgpr2
+    %6:_(s32) = COPY $sgpr3
+    %0:_(<4 x s32>) = G_BUILD_VECTOR %3(s32), %4(s32), %5(s32), %6(s32)
+    %1:_(s32) = COPY $vgpr0
+    %7:_(s32) = COPY $vgpr1
+    %8:_(s32) = COPY $vgpr2
+    %2:_(p1) = G_MERGE_VALUES %7(s32), %8(s32)
+    %11:_(s32) = G_CONSTANT i32 0
+    %10:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD %0(<4 x s32>), %11(s32), %1, %11, 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+    %13:_(s32) = G_CONSTANT i32 1
+    %15:_(s32), %16:_(s32), %17:_(s32), %18:_(s32) = G_UNMERGE_VALUES %10(<4 x s32>)
+    %14:_(s32) = G_ADD %16, %13
+    G_STORE %14(s32), %2(p1) :: (store (s32), addrspace 1)
+    S_ENDPGM 0
+...
+
+---
+name: vgpr_and_i64
+legalized: true
+body: |
+  bb.1:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+    ; CHECK-LABEL: name: vgpr_and_i64
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
+    ; CHECK-NEXT: [[MV1:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr4
+    ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr5
+    ; CHECK-NEXT: [[MV2:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY4]](s32), [[COPY5]](s32)
+    ; CHECK-NEXT: [[AND:%[0-9]+]]:_(s64) = G_AND [[MV]], [[MV1]]
+    ; CHECK-NEXT: G_STORE [[AND]](s64), [[MV2]](p1) :: (store (s64), addrspace 1)
+    ; CHECK-NEXT: S_ENDPGM 0
+    %3:_(s32) = COPY $vgpr0
+    %4:_(s32) = COPY $vgpr1
+    %0:_(s64) = G_MERGE_VALUES %3(s32), %4(s32)
+    %5:_(s32) = COPY $vgpr2
+    %6:_(s32) = COPY $vgpr3
+    %1:_(s64) = G_MERGE_VALUES %5(s32), %6(s32)
+    %7:_(s32) = COPY $vgpr4
+    %8:_(s32) = COPY $vgpr5
+    %2:_(p1) = G_MERGE_VALUES %7(s32), %8(s32)
+    %10:_(s64) = G_AND %0, %1
+    G_STORE %10(s64), %2(p1) :: (store (s64), addrspace 1)
+    S_ENDPGM 0
+...
+
+---
+name: abs_sgpr_i16
+legalized: true
+body: |
+  bb.1:
+    liveins: $sgpr0, $vgpr0, $vgpr1
+
+    ; CHECK-LABEL: name: abs_sgpr_i16
+    ; CHECK: liveins: $sgpr0, $vgpr0, $vgpr1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr1
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY1]](s32), [[COPY2]](s32)
+    ; CHECK-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]]
+    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16)
+    ; CHECK-NEXT: G_STORE [[ANYEXT]](s32), [[MV]](p1) :: (store (s16), addrspace 1)
+    ; CHECK-NEXT: S_ENDPGM 0
+    %2:_(s32) = COPY $sgpr0
+    %0:_(s16) = G_TRUNC %2(s32)
+    %3:_(s32) = COPY $vgpr0
+    %4:_(s32) = COPY $vgpr1
+    %1:_(p1) = G_MERGE_VALUES %3(s32), %4(s32)
+    %6:_(s16) = G_ABS %0
+    %7:_(s32) = G_ANYEXT %6(s16)
+    G_STORE %7(s32), %1(p1) :: (store (s16), addrspace 1)
+    S_ENDPGM 0
+...
+
+---
+name: uniform_i1_phi
+legalized: true
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: uniform_i1_phi
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x30000000), %bb.2(0x50000000)
+  ; CHECK-NEXT:   liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
+  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
+  ; CHECK-NEXT:   [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; CHECK-NEXT:   [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr0
+  ; CHECK-NEXT:   [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr1
+  ; CHECK-NEXT:   [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 6
+  ; CHECK-NEXT:   [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY2]](s32), [[C]]
+  ; CHECK-NEXT:   [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+  ; CHECK-NEXT:   [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[COPY3]](s32), [[C1]]
+  ; CHECK-NEXT:   G_BRCOND [[ICMP1]](s1), %bb.2
+  ; CHECK-NEXT:   G_BR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.2(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+  ; CHECK-NEXT:   [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(ult), [[COPY2]](s32), [[C2]]
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s1) = G_PHI [[ICMP]](s1), %bb.0, [[ICMP2]](s1), %bb.1
+  ; CHECK-NEXT:   [[SEXT:%[0-9]+]]:_(s32) = G_SEXT [[PHI]](s1)
+  ; CHECK-NEXT:   [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 2
+  ; CHECK-NEXT:   [[ADD:%[0-9]+]]:_(s32) = G_ADD [[SEXT]], [[C3]]
+  ; CHECK-NEXT:   G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+  ; CHECK-NEXT:   S_ENDPGM 0
+  bb.1:
+    successors: %bb.2(0x30000000), %bb.3(0x50000000)
+    liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
...
[truncated]

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 18, 2024

@llvm/pr-subscribers-llvm-globalisel

Author: Petar Avramovic (petar-avramovic)

Changes

New register bank select for AMDGPU will be split in two passes:
RBSelect: select banks based on machine uniformity analysis
RBLegalize: lower instructions that can't be inst-selected with register
banks assigned by RBSelect. Does not change already assigned banks.
Similar to legalizer but with context of uniformity analysis.
RBLegalize main goal is to provide high level table-like overview of how
to lower generic instructions based on available target features and
uniformity info (uniform vs divergent). See RegBankLegalizeRules.

Summary of new features:
At the moment reg bank select assigns register bank to output register
using simple algorithm:

  • one of the inputs is vgpr output is vgpr
  • all inputs are sgpr output is sgpr.
    When function does not contain divergent control flow propagating reg
    banks like this works. In general, first point is still correct but
    second is not when function contains divergent control flow. Examples:
  • Phi with uniform inputs that go through divergent branch
  • Instruction with temporal divergent use.
    To fix this RB-select will use machine uniformity analysis to assign
    vgpr to each divergent and sgpr to each uniform instruction.
    But some instructions are only available on VALU (for example floating
    point instructions before gfx1150) and we need to assign vgpr to them.
    Since we are no longer propagating reg banks we need to ensure that
    uniform instructions get their inputs in sgpr in some way.
    In RB-legalize uniform instructions that are only available on VALU will
    be reassigned to vgpr on all operands and readfirstlane vgpr output to
    original sgpr output.

Patch is 190.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112862.diff

11 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+8)
  • (added) llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp (+74)
  • (added) llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp (+66)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+13-1)
  • (modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+2)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir (+858)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-select.mir (+858)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-salu-float.ll (+54)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-salu-float.mir (+92)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui.ll (+635)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui.mir (+1377)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 342d55e828bca5..1f5f22f8e7d8a0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -39,6 +39,8 @@ FunctionPass *createSIFoldOperandsLegacyPass();
 FunctionPass *createSIPeepholeSDWALegacyPass();
 FunctionPass *createSILowerI1CopiesLegacyPass();
 FunctionPass *createAMDGPUGlobalISelDivergenceLoweringPass();
+FunctionPass *createAMDGPURBSelectPass();
+FunctionPass *createAMDGPURBLegalizePass();
 FunctionPass *createSIShrinkInstructionsLegacyPass();
 FunctionPass *createSILoadStoreOptimizerLegacyPass();
 FunctionPass *createSIWholeQuadModePass();
@@ -188,6 +190,12 @@ extern char &SILowerI1CopiesLegacyID;
 void initializeAMDGPUGlobalISelDivergenceLoweringPass(PassRegistry &);
 extern char &AMDGPUGlobalISelDivergenceLoweringID;
 
+void initializeAMDGPURBSelectPass(PassRegistry &);
+extern char &AMDGPURBSelectID;
+
+void initializeAMDGPURBLegalizePass(PassRegistry &);
+extern char &AMDGPURBLegalizeID;
+
 void initializeAMDGPUMarkLastScratchLoadPass(PassRegistry &);
 extern char &AMDGPUMarkLastScratchLoadID;
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp
new file mode 100644
index 00000000000000..9a9722559377f6
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp
@@ -0,0 +1,74 @@
+//===-- AMDGPURBLegalize.cpp ----------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// Lower G_ instructions that can't be inst-selected with register bank
+/// assignment given by RB-select based on machine uniformity info.
+/// Given types on all operands, some register bank assignments require lowering
+/// while other do not.
+/// Note: cases where all register bank assignments would require lowering are
+/// lowered in legalizer.
+/// For example vgpr S64 G_AND requires lowering to S32 while SGPR S64 does not.
+/// Eliminate sgpr S1 by lowering to sgpr S32.
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/InitializePasses.h"
+
+#define DEBUG_TYPE "rb-legalize"
+
+using namespace llvm;
+
+namespace {
+
+class AMDGPURBLegalize : public MachineFunctionPass {
+public:
+  static char ID;
+
+public:
+  AMDGPURBLegalize() : MachineFunctionPass(ID) {
+    initializeAMDGPURBLegalizePass(*PassRegistry::getPassRegistry());
+  }
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+
+  StringRef getPassName() const override { return "AMDGPU RB Legalize"; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+  // If there were no phis and we do waterfall expansion machine verifier would
+  // fail.
+  MachineFunctionProperties getClearedProperties() const override {
+    return MachineFunctionProperties().set(
+        MachineFunctionProperties::Property::NoPHIs);
+  }
+};
+
+} // End anonymous namespace.
+
+INITIALIZE_PASS_BEGIN(AMDGPURBLegalize, DEBUG_TYPE, "AMDGPU RB Legalize", false,
+                      false)
+INITIALIZE_PASS_END(AMDGPURBLegalize, DEBUG_TYPE, "AMDGPU RB Legalize", false,
+                    false)
+
+char AMDGPURBLegalize::ID = 0;
+
+char &llvm::AMDGPURBLegalizeID = AMDGPURBLegalize::ID;
+
+FunctionPass *llvm::createAMDGPURBLegalizePass() {
+  return new AMDGPURBLegalize();
+}
+
+using namespace AMDGPU;
+
+bool AMDGPURBLegalize::runOnMachineFunction(MachineFunction &MF) {
+  return true;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp b/llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp
new file mode 100644
index 00000000000000..c53a68ff72a8ad
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPURBSelect.cpp
@@ -0,0 +1,66 @@
+//===-- AMDGPURBSelect.cpp ------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// Assign register banks to all register operands of G_ instructions using
+/// machine uniformity analysis.
+/// SGPR - uniform values and some lane masks
+/// VGPR - divergent, non S1, values
+/// VCC  - divergent S1 values(lane masks)
+/// However in some cases G_ instructions with this register bank assignment
+/// can't be inst-selected. This is solved in RBLegalize.
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/InitializePasses.h"
+
+#define DEBUG_TYPE "rb-select"
+
+using namespace llvm;
+
+namespace {
+
+class AMDGPURBSelect : public MachineFunctionPass {
+public:
+  static char ID;
+
+public:
+  AMDGPURBSelect() : MachineFunctionPass(ID) {
+    initializeAMDGPURBSelectPass(*PassRegistry::getPassRegistry());
+  }
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+
+  StringRef getPassName() const override { return "AMDGPU RB select"; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+  // This pass assigns register banks to all virtual registers, and we maintain
+  // this property in subsequent passes
+  MachineFunctionProperties getSetProperties() const override {
+    return MachineFunctionProperties().set(
+        MachineFunctionProperties::Property::RegBankSelected);
+  }
+};
+
+} // End anonymous namespace.
+
+INITIALIZE_PASS_BEGIN(AMDGPURBSelect, DEBUG_TYPE, "AMDGPU RB select", false,
+                      false)
+INITIALIZE_PASS_END(AMDGPURBSelect, DEBUG_TYPE, "AMDGPU RB select", false,
+                    false)
+
+char AMDGPURBSelect::ID = 0;
+
+char &llvm::AMDGPURBSelectID = AMDGPURBSelect::ID;
+
+FunctionPass *llvm::createAMDGPURBSelectPass() { return new AMDGPURBSelect(); }
+
+bool AMDGPURBSelect::runOnMachineFunction(MachineFunction &MF) { return true; }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index e4cc522194f2a9..c77e6b05395acc 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -443,6 +443,11 @@ static cl::opt<bool>
                            cl::desc("Enable AMDGPUAttributorPass"),
                            cl::init(true), cl::Hidden);
 
+static cl::opt<bool> NewRegBankSelect(
+    "new-reg-bank-select",
+    cl::desc("Run rb-select and rb-legalize instead of amdgpu-regbankselect"),
+    cl::init(false), cl::Hidden);
+
 extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
   // Register the target
   RegisterTargetMachine<R600TargetMachine> X(getTheR600Target());
@@ -459,6 +464,8 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
   initializeGCNDPPCombineLegacyPass(*PR);
   initializeSILowerI1CopiesLegacyPass(*PR);
   initializeAMDGPUGlobalISelDivergenceLoweringPass(*PR);
+  initializeAMDGPURBSelectPass(*PR);
+  initializeAMDGPURBLegalizePass(*PR);
   initializeSILowerWWMCopiesPass(*PR);
   initializeAMDGPUMarkLastScratchLoadPass(*PR);
   initializeSILowerSGPRSpillsLegacyPass(*PR);
@@ -1371,7 +1378,12 @@ void GCNPassConfig::addPreRegBankSelect() {
 }
 
 bool GCNPassConfig::addRegBankSelect() {
-  addPass(new AMDGPURegBankSelect());
+  if (NewRegBankSelect) {
+    addPass(createAMDGPURBSelectPass());
+    addPass(createAMDGPURBLegalizePass());
+  } else {
+    addPass(new AMDGPURegBankSelect());
+  }
   return false;
 }
 
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index fed29c3e14aae2..019ec108670fa8 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -57,6 +57,8 @@ add_llvm_target(AMDGPUCodeGen
   AMDGPUExportClustering.cpp
   AMDGPUFrameLowering.cpp
   AMDGPUGlobalISelDivergenceLowering.cpp
+  AMDGPURBSelect.cpp
+  AMDGPURBLegalize.cpp
   AMDGPUGlobalISelUtils.cpp
   AMDGPUHSAMetadataStreamer.cpp
   AMDGPUInsertDelayAlu.cpp
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir
new file mode 100644
index 00000000000000..880057813adf54
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui-rb-legalize.mir
@@ -0,0 +1,858 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -run-pass=none %s -verify-machineinstrs -o - | FileCheck %s
+
+---
+name: uniform_in_vgpr
+legalized: true
+body: |
+  bb.1:
+    liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
+
+    ; CHECK-LABEL: name: uniform_in_vgpr
+    ; CHECK: liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr1
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
+    ; CHECK-NEXT: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[COPY]](s32)
+    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[COPY1]]
+    ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+    ; CHECK-NEXT: S_ENDPGM 0
+    %0:_(s32) = COPY $sgpr0
+    %1:_(s32) = COPY $sgpr1
+    %3:_(s32) = COPY $vgpr0
+    %4:_(s32) = COPY $vgpr1
+    %2:_(p1) = G_MERGE_VALUES %3(s32), %4(s32)
+    %6:_(s32) = G_FPTOUI %0(s32)
+    %7:_(s32) = G_ADD %6, %1
+    G_STORE %7(s32), %2(p1) :: (store (s32), addrspace 1)
+    S_ENDPGM 0
+...
+
+---
+name: back_to_back_uniform_in_vgpr
+legalized: true
+body: |
+  bb.1:
+    liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr0, $vgpr1
+
+    ; CHECK-LABEL: name: back_to_back_uniform_in_vgpr
+    ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr0, $vgpr1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr2
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr1
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY3]](s32), [[COPY4]](s32)
+    ; CHECK-NEXT: [[FADD:%[0-9]+]]:_(s32) = G_FADD [[COPY]], [[COPY1]]
+    ; CHECK-NEXT: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FADD]](s32)
+    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[COPY2]]
+    ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+    ; CHECK-NEXT: S_ENDPGM 0
+    %0:_(s32) = COPY $sgpr0
+    %1:_(s32) = COPY $sgpr1
+    %2:_(s32) = COPY $sgpr2
+    %4:_(s32) = COPY $vgpr0
+    %5:_(s32) = COPY $vgpr1
+    %3:_(p1) = G_MERGE_VALUES %4(s32), %5(s32)
+    %7:_(s32) = G_FADD %0, %1
+    %8:_(s32) = G_FPTOUI %7(s32)
+    %9:_(s32) = G_ADD %8, %2
+    G_STORE %9(s32), %3(p1) :: (store (s32), addrspace 1)
+    S_ENDPGM 0
+...
+
+---
+name: buffer_load_uniform
+legalized: true
+body: |
+  bb.1:
+    liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $vgpr0, $vgpr1
+
+    ; CHECK-LABEL: name: buffer_load_uniform
+    ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $vgpr0, $vgpr1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr2
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr3
+    ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32)
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr4
+    ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[COPY6:%[0-9]+]]:_(s32) = COPY $vgpr1
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY5]](s32), [[COPY6]](s32)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; CHECK-NEXT: [[AMDGPU_BUFFER_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD [[BUILD_VECTOR]](<4 x s32>), [[C]](s32), [[COPY4]], [[C]], 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+    ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_BUFFER_LOAD]](<4 x s32>)
+    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[C1]]
+    ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+    ; CHECK-NEXT: S_ENDPGM 0
+    %3:_(s32) = COPY $sgpr0
+    %4:_(s32) = COPY $sgpr1
+    %5:_(s32) = COPY $sgpr2
+    %6:_(s32) = COPY $sgpr3
+    %0:_(<4 x s32>) = G_BUILD_VECTOR %3(s32), %4(s32), %5(s32), %6(s32)
+    %1:_(s32) = COPY $sgpr4
+    %7:_(s32) = COPY $vgpr0
+    %8:_(s32) = COPY $vgpr1
+    %2:_(p1) = G_MERGE_VALUES %7(s32), %8(s32)
+    %11:_(s32) = G_CONSTANT i32 0
+    %10:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD %0(<4 x s32>), %11(s32), %1, %11, 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+    %13:_(s32) = G_CONSTANT i32 1
+    %15:_(s32), %16:_(s32), %17:_(s32), %18:_(s32) = G_UNMERGE_VALUES %10(<4 x s32>)
+    %14:_(s32) = G_ADD %16, %13
+    G_STORE %14(s32), %2(p1) :: (store (s32), addrspace 1)
+    S_ENDPGM 0
+...
+
+---
+name: buffer_load_divergent
+legalized: true
+body: |
+  bb.1:
+    liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $vgpr0, $vgpr1, $vgpr2
+
+    ; CHECK-LABEL: name: buffer_load_divergent
+    ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $vgpr0, $vgpr1, $vgpr2
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr2
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr3
+    ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32)
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr1
+    ; CHECK-NEXT: [[COPY6:%[0-9]+]]:_(s32) = COPY $vgpr2
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY5]](s32), [[COPY6]](s32)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; CHECK-NEXT: [[AMDGPU_BUFFER_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD [[BUILD_VECTOR]](<4 x s32>), [[C]](s32), [[COPY4]], [[C]], 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+    ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_BUFFER_LOAD]](<4 x s32>)
+    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[C1]]
+    ; CHECK-NEXT: G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+    ; CHECK-NEXT: S_ENDPGM 0
+    %3:_(s32) = COPY $sgpr0
+    %4:_(s32) = COPY $sgpr1
+    %5:_(s32) = COPY $sgpr2
+    %6:_(s32) = COPY $sgpr3
+    %0:_(<4 x s32>) = G_BUILD_VECTOR %3(s32), %4(s32), %5(s32), %6(s32)
+    %1:_(s32) = COPY $vgpr0
+    %7:_(s32) = COPY $vgpr1
+    %8:_(s32) = COPY $vgpr2
+    %2:_(p1) = G_MERGE_VALUES %7(s32), %8(s32)
+    %11:_(s32) = G_CONSTANT i32 0
+    %10:_(<4 x s32>) = G_AMDGPU_BUFFER_LOAD %0(<4 x s32>), %11(s32), %1, %11, 0, 0, 0 :: (dereferenceable load (<4 x s32>), align 1, addrspace 8)
+    %13:_(s32) = G_CONSTANT i32 1
+    %15:_(s32), %16:_(s32), %17:_(s32), %18:_(s32) = G_UNMERGE_VALUES %10(<4 x s32>)
+    %14:_(s32) = G_ADD %16, %13
+    G_STORE %14(s32), %2(p1) :: (store (s32), addrspace 1)
+    S_ENDPGM 0
+...
+
+---
+name: vgpr_and_i64
+legalized: true
+body: |
+  bb.1:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+    ; CHECK-LABEL: name: vgpr_and_i64
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
+    ; CHECK-NEXT: [[MV1:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr4
+    ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr5
+    ; CHECK-NEXT: [[MV2:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY4]](s32), [[COPY5]](s32)
+    ; CHECK-NEXT: [[AND:%[0-9]+]]:_(s64) = G_AND [[MV]], [[MV1]]
+    ; CHECK-NEXT: G_STORE [[AND]](s64), [[MV2]](p1) :: (store (s64), addrspace 1)
+    ; CHECK-NEXT: S_ENDPGM 0
+    %3:_(s32) = COPY $vgpr0
+    %4:_(s32) = COPY $vgpr1
+    %0:_(s64) = G_MERGE_VALUES %3(s32), %4(s32)
+    %5:_(s32) = COPY $vgpr2
+    %6:_(s32) = COPY $vgpr3
+    %1:_(s64) = G_MERGE_VALUES %5(s32), %6(s32)
+    %7:_(s32) = COPY $vgpr4
+    %8:_(s32) = COPY $vgpr5
+    %2:_(p1) = G_MERGE_VALUES %7(s32), %8(s32)
+    %10:_(s64) = G_AND %0, %1
+    G_STORE %10(s64), %2(p1) :: (store (s64), addrspace 1)
+    S_ENDPGM 0
+...
+
+---
+name: abs_sgpr_i16
+legalized: true
+body: |
+  bb.1:
+    liveins: $sgpr0, $vgpr0, $vgpr1
+
+    ; CHECK-LABEL: name: abs_sgpr_i16
+    ; CHECK: liveins: $sgpr0, $vgpr0, $vgpr1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr1
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY1]](s32), [[COPY2]](s32)
+    ; CHECK-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]]
+    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16)
+    ; CHECK-NEXT: G_STORE [[ANYEXT]](s32), [[MV]](p1) :: (store (s16), addrspace 1)
+    ; CHECK-NEXT: S_ENDPGM 0
+    %2:_(s32) = COPY $sgpr0
+    %0:_(s16) = G_TRUNC %2(s32)
+    %3:_(s32) = COPY $vgpr0
+    %4:_(s32) = COPY $vgpr1
+    %1:_(p1) = G_MERGE_VALUES %3(s32), %4(s32)
+    %6:_(s16) = G_ABS %0
+    %7:_(s32) = G_ANYEXT %6(s16)
+    G_STORE %7(s32), %1(p1) :: (store (s16), addrspace 1)
+    S_ENDPGM 0
+...
+
+---
+name: uniform_i1_phi
+legalized: true
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: uniform_i1_phi
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x30000000), %bb.2(0x50000000)
+  ; CHECK-NEXT:   liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
+  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
+  ; CHECK-NEXT:   [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; CHECK-NEXT:   [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr0
+  ; CHECK-NEXT:   [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr1
+  ; CHECK-NEXT:   [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 6
+  ; CHECK-NEXT:   [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY2]](s32), [[C]]
+  ; CHECK-NEXT:   [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+  ; CHECK-NEXT:   [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[COPY3]](s32), [[C1]]
+  ; CHECK-NEXT:   G_BRCOND [[ICMP1]](s1), %bb.2
+  ; CHECK-NEXT:   G_BR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.2(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+  ; CHECK-NEXT:   [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(ult), [[COPY2]](s32), [[C2]]
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s1) = G_PHI [[ICMP]](s1), %bb.0, [[ICMP2]](s1), %bb.1
+  ; CHECK-NEXT:   [[SEXT:%[0-9]+]]:_(s32) = G_SEXT [[PHI]](s1)
+  ; CHECK-NEXT:   [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 2
+  ; CHECK-NEXT:   [[ADD:%[0-9]+]]:_(s32) = G_ADD [[SEXT]], [[C3]]
+  ; CHECK-NEXT:   G_STORE [[ADD]](s32), [[MV]](p1) :: (store (s32), addrspace 1)
+  ; CHECK-NEXT:   S_ENDPGM 0
+  bb.1:
+    successors: %bb.2(0x30000000), %bb.3(0x50000000)
+    liveins: $sgpr0, $sgpr1, $vgpr0, $vgpr1
...
[truncated]

@@ -39,6 +39,8 @@ FunctionPass *createSIFoldOperandsLegacyPass();
FunctionPass *createSIPeepholeSDWALegacyPass();
FunctionPass *createSILowerI1CopiesLegacyPass();
FunctionPass *createAMDGPUGlobalISelDivergenceLoweringPass();
FunctionPass *createAMDGPURBSelectPass();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please spell out RegBank wherever possible, it makes it easier to grep for things in the codebase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will wait for more comments then bulk update everything

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions since AMDGPURegBankSelect is taken?
AMDGPURBSelect -> ...
AMDGPURBLegalize -> AMDGPURegBankLegalize is fine

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk, I was hoping we could get rid of the other one :) Maybe we could call this one AMDGPUStandaloneRegBankSelect, since it's not inheriting from RegBankSelect (at least if that's the intention for the long term).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just delete the other one? I don't think it's doing anything now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AMDGPURegBankSelect is functional except for some functions with divergent control flow. Deleting it now breaks almost all globalisel regression tests.
When new regbankselect path gets functional we can make it a default path and decide what to do with AMDGPURegBankSelect, maybe rename it to legacy or just delete it.

New pass names as suggested:
AMDGPUStandaloneRegBankSelect
AMDGPURegBankLegalize

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete the old one and replace it with the generic regbankselect pass. It's not doing anything in the pass other than running the analysis

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting default reg-bank-select to RegBankSelect, new pass will take
AMDGPURegBankSelect

addPass(createAMDGPURBSelectPass());
addPass(createAMDGPURBLegalizePass());
} else {
addPass(new AMDGPURegBankSelect());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just modify this AMDGPURegBankSelect instead of adding a new one? It was meant to use uniformity analysis too. Maybe you can propagate the flag into the pass if you want to keep the old behaviour around in the meantime.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Older versions were doing that. New passes are much simpler and don't use anything from generic reg-bank-select(except register banks that come from td file via RegisterBankInfo).
To simplify review process and transition period I would prefer two passes separate from AMDGPURegBankSelect, rb-legalize will also be split across few files.
Old behavior is there to keep upcoming patches small by gradually switching to -new-reg-bank-select.

llvm/lib/Target/AMDGPU/AMDGPURBLegalize.cpp Outdated Show resolved Hide resolved
//===----------------------------------------------------------------------===//
//
/// Lower G_ instructions that can't be inst-selected with register bank
/// assignment given by RB-select based on machine uniformity info.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe mention explicitly that this is expected to run after RB-select? This might be surprising since the usual GISel pipeline has legalization first and then RB-select.

@@ -0,0 +1,858 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -run-pass=none %s -verify-machineinstrs -o - | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this run rb-legalize? It only returns true, so it should be pretty harmless.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, will update mir test run lines

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, correction, not true. Machine verifier will complain that Generic virtual register must have a bank in a RegBankSelected function.
Now that I think of it, RBLegalize should probably also require that MachineFunction has Property::RegBankSelected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the point of this test is then. If it's testing the verifier, it should go in test/MachineVerifier (and then doesn't need -verify-machineinstrs)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is precommit to see something in diff when RBSelect and RBLegalize become functional instead of seeing new test with no changes.

@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/new-rbs-skeleton branch 3 times, most recently from 8bf0a23 to 84284ba Compare October 28, 2024 14:48
@petar-avramovic petar-avramovic changed the base branch from main to users/petar-avramovic/new-rbs-revert-amdgpu-regbankselect October 28, 2024 14:48
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/new-rbs-revert-amdgpu-regbankselect branch from 73862a2 to 6f8e867 Compare October 28, 2024 14:57
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/new-rbs-skeleton branch from 84284ba to 3b0aaef Compare October 28, 2024 14:57
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/new-rbs-revert-amdgpu-regbankselect branch from 6f8e867 to f6c230d Compare October 28, 2024 16:03
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/new-rbs-skeleton branch from 3b0aaef to 07055a7 Compare October 28, 2024 16:03
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/new-rbs-revert-amdgpu-regbankselect branch from f6c230d to 5fa4bae Compare October 30, 2024 14:44
New register bank select for AMDGPU will be split in two passes:
- AMDGPURegBankSelect: select banks based on machine uniformity analysis
- AMDGPURegBankLegalize: lower instructions that can't be inst-selected
  with register banks assigned by AMDGPURegBankSelect.
AMDGPURegBankLegalize is similar to legalizer but with context of
uniformity analysis. Does not change already assigned banks.
Main goal of AMDGPURegBankLegalize is to provide high level table-like
overview of how to lower generic instructions based on available target
features and uniformity info (uniform vs divergent).
See RegBankLegalizeRules.

Summary of new features:
At the moment register bank select assigns register bank to output
register using simple algorithm:
- one of the inputs is vgpr output is vgpr
- all inputs are sgpr output is sgpr.
When function does not contain divergent control flow propagating
register banks like this works. In general, first point is still correct
but second is not when function contains divergent control flow.
Examples:
- Phi with uniform inputs that go through divergent branch
- Instruction with temporal divergent use.
To fix this AMDGPURegBankSelect will use machine uniformity analysis
to assign vgpr to each divergent and sgpr to each uniform instruction.
But some instructions are only available on VALU (for example floating
point instructions before gfx1150) and we need to assign vgpr to them.
Since we are no longer propagating register banks we need to ensure that
uniform instructions get their inputs in sgpr in some way.
In AMDGPURegBankLegalize uniform instructions that are only available on
VALU will be reassigned to vgpr on all operands and read-any-lane vgpr
output to original sgpr output.
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/new-rbs-skeleton branch from 07055a7 to 623266f Compare October 30, 2024 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants