AMD GPU Support via HIP #4

pierotofy · 2024-02-16T17:32:24Z

It should be possible to port the CUDA code to HIP and run this on AMD GPUs as well.

pfxuan · 2024-03-01T18:46:57Z

@pierotofy Do you have a high level suggestion to the path of porting CUDA code to AMD HIP? I'm planning to give it a quick try with ZLUDA. Any thoughts?

pierotofy · 2024-03-02T08:40:24Z

I personally would start from https://rocmdocs.amd.com/projects/HIPIFY/en/latest/ and https://rocm.docs.amd.com/projects/HIP/en/latest/

ZLUDA seems interesting too, but I haven't tried it.

pierotofy · 2024-03-05T20:41:29Z

Quick update: HIP is missing some functions that are used in gsplat's CUDA code. This will require some manual changes as an automatic port does not seem feasible.

pfxuan · 2024-03-06T08:17:51Z

hipify-clang seems more robust than hipify-perl to port CUDA code. I was able to run through these 3 cuda code translation after carefully tweaking cuda, rocm, and clang/llvm versions:

os: [ubuntu-22.04]
torch-version: [2.2.1]
cuda-version: [11.8.0]
rocm-version: [5.6.1]
llvm-version: [16]

[HIPIFY] info: file './vendor/gsplat/backward.cu' statistics:
  CONVERTED refs count: 24
  UNCONVERTED refs count: 0
  CONVERSION %: 100.0
  REPLACED bytes: 22
  TOTAL bytes: 18934
  CHANGED lines of code: 2
  TOTAL lines of code: 502
  CODE CHANGED (in bytes) %: 0.1
  CODE CHANGED (in lines) %: 0.4
  TIME ELAPSED s: 1.63
[HIPIFY] info: CONVERTED refs by type:
  device_function: 23
  include: 1
[HIPIFY] info: CONVERTED refs by API:
  CUDA RT API: 24
[HIPIFY] info: CONVERTED refs by names:
  __expf: 2
  atomicAdd: 16
  cooperative_groups.h: 1
  max: 1
  min: 4

[HIPIFY] info: file './vendor/gsplat/bindings.cu' statistics:
  CONVERTED refs count: 4
  UNCONVERTED refs count: 0
  CONVERSION %: 100.0
  REPLACED bytes: 75
  TOTAL bytes: 20985
  CHANGED lines of code: 4
  TOTAL lines of code: 624
  CODE CHANGED (in bytes) %: 0.4
  CODE CHANGED (in lines) %: 0.6
  TIME ELAPSED s: 14.87
[HIPIFY] info: CONVERTED refs by type:
  include: 2
  include_cuda_main_header: 2
[HIPIFY] info: CONVERTED refs by API:
  CUDA Driver API: 1
  CUDA RT API: 3
[HIPIFY] info: CONVERTED refs by names:
  cooperative_groups.h: 1
  cuda.h: 1
  cuda_runtime.h: 1
  cuda_runtime_api.h: 1
  
  [HIPIFY] info: file './vendor/gsplat/forward.cu' statistics:
  CONVERTED refs count: 7
  UNCONVERTED refs count: 0
  CONVERSION %: 100.0
  REPLACED bytes: 22
  TOTAL bytes: 16216
  CHANGED lines of code: 2
  TOTAL lines of code: 463
  CODE CHANGED (in bytes) %: 0.1
  CODE CHANGED (in lines) %: 0.4
  TIME ELAPSED s: 0.90
[HIPIFY] info: CONVERTED refs by type:
  device_function: 6
  include: 1
[HIPIFY] info: CONVERTED refs by API:
  CUDA RT API: 7
[HIPIFY] info: CONVERTED refs by names:
  __expf: 2
  __syncthreads_count: 1
  cooperative_groups.h: 1
  min: 3
[HIPIFY] info: file 'GLOBAL' statistics:
  CONVERTED refs count: 35
  UNCONVERTED refs count: 0
  CONVERSION %: 100.0
  REPLACED bytes: 119
  TOTAL bytes: 56135
  CHANGED lines of code: 8
  TOTAL lines of code: 1589
  CODE CHANGED (in bytes) %: 0.2
  CODE CHANGED (in lines) %: 0.5
  TIME ELAPSED s: 17.39
[HIPIFY] info: CONVERTED refs by type:
  device_function: 29
  include: 4
  include_cuda_main_header: 2
[HIPIFY] info: CONVERTED refs by API:
  CUDA Driver API: 1
  CUDA RT API: 34
[HIPIFY] info: CONVERTED refs by names:
  __expf: 4
  __syncthreads_count: 1
  atomicAdd: 16
  cooperative_groups.h: 3
  cuda.h: 1
  cuda_runtime.h: 1
  cuda_runtime_api.h: 1
  max: 1
  min: 7
[HIPIFY] info: TOTAL statistics:
  CONVERTED files: 3
  PROCESSED files: 3

Let's see if we can compile them into an executable file. To further test the compatibility and performance, I probably need to find a way to hook 2nd GPU (AMD Radeon RX 6700 XT) into my mini desktop.

pierotofy · 2024-03-06T14:27:16Z

I had trouble with the functions in cooperative_groups/reduce.h, which don't seem to have an equivalent in HIP (but I used hipify-perl). I managed to compile it after making some manual changes, but had some crashes which I have yet to investigate.

Super experimental, proof of concept branch https://github.com/pierotofy/OpenSplat/tree/hippoc

pfxuan · 2024-03-07T02:38:03Z

Unfortunately, hipify-clang generates a very similar code as hipify-perl did. But it seems like most functions can be directly mapped into HIP except cg::reduce() and cg::this_thread_block(). There should be a way to manually map them:

auto block = hipThreadIdx_x + hipBlockDim_x * hipBlockIdx_x;
int32_t tile_id =
    hipBlockIdx_y * tile_bounds.x + hipBlockIdx_x;
unsigned i =
    hipBlockIdx_y * hipBlockDim_y + hipThreadIdx_y;
unsigned j =
    hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x;

inline __device__ void warpSum(float& val, const int& tile){
    for (int offset = tile / 2; offset > 0; offset /= 2) {
        val += __shfl_down_sync(0xffffffff, val, offset);
    }
}

pfxuan · 2024-03-07T16:27:05Z

BTW, ROCm HIP build pipeline is ready. What's the best way to integrate it with your working branch (hippoc)?

pfxuan · 2024-03-10T08:15:58Z

@pierotofy Based-on your current work (hippoc), I created a new PR#36 to consolidate our updates. It would be great to have an actual test. From my end, it might take time for me to find a way to hook up an extra AMD GPU into my desktop. It already has a Nvidia 3080 ti up running and the chassis space is very tight right now.

BTW, if you can find some AMD/Nvidia GPU resources, building a self-hosted GPU runner could be a good option for daily build & benchmark tasks.

pierotofy · 2024-03-10T14:05:05Z

I don't have an AMD GPU either, but I was thinking of using https://github.com/ROCm/HIP-CPU to test.

pfxuan · 2024-03-10T15:13:51Z

It's a pretty interesting project and could become useful for Mac port. Thank you for pointing it out.
If we want to benchmark OpenSplat performance on datacenter GPU, potentially AMD Accelerator Cloud (AAC) might be able to provide the required resources for us - https://github.com/amddcgpuce/AMDAcceleratorCloudGuides

pierotofy · 2024-03-11T16:25:20Z

Found that HIP-CPU does not have the cooperative_groups.h header, so the entire cg namespace is missing, which is problematic, since there's a lot of references for that in the gsplat implementation.

For CPU support I would either have to refactor gsplat to not use cooperative groups or rewrite the entire thing for CPU (or wait and hope for HIP-CPU support: ROCm/HIP-CPU#60)

pfxuan · 2024-03-16T18:16:31Z

Finally, swapped Nvidia with AMD GPU and almost ready for the battle test 😅

pfxuan · 2024-03-17T03:54:17Z

AMD GPU is now working 🚀

Fix ROCm 6.x training error #42

pierotofy added the enhancement New feature or request label Feb 16, 2024

pierotofy closed this as completed Mar 21, 2024

embercult mentioned this issue May 4, 2024

Docker build on a100 gpu libtorch cuda error #90

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD GPU Support via HIP #4

AMD GPU Support via HIP #4

pierotofy commented Feb 16, 2024

pfxuan commented Mar 1, 2024

pierotofy commented Mar 2, 2024

pierotofy commented Mar 5, 2024

pfxuan commented Mar 6, 2024

pierotofy commented Mar 6, 2024 •

edited

Loading

pfxuan commented Mar 7, 2024

pfxuan commented Mar 7, 2024 •

edited

Loading

pfxuan commented Mar 10, 2024 •

edited

Loading

pierotofy commented Mar 10, 2024

pfxuan commented Mar 10, 2024

pierotofy commented Mar 11, 2024 •

edited

Loading

pfxuan commented Mar 16, 2024

pfxuan commented Mar 17, 2024

AMD GPU Support via HIP #4

AMD GPU Support via HIP #4

Comments

pierotofy commented Feb 16, 2024

pfxuan commented Mar 1, 2024

pierotofy commented Mar 2, 2024

pierotofy commented Mar 5, 2024

pfxuan commented Mar 6, 2024

pierotofy commented Mar 6, 2024 • edited Loading

pfxuan commented Mar 7, 2024

pfxuan commented Mar 7, 2024 • edited Loading

pfxuan commented Mar 10, 2024 • edited Loading

pierotofy commented Mar 10, 2024

pfxuan commented Mar 10, 2024

pierotofy commented Mar 11, 2024 • edited Loading

pfxuan commented Mar 16, 2024

pfxuan commented Mar 17, 2024

pierotofy commented Mar 6, 2024 •

edited

Loading

pfxuan commented Mar 7, 2024 •

edited

Loading

pfxuan commented Mar 10, 2024 •

edited

Loading

pierotofy commented Mar 11, 2024 •

edited

Loading