-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD GPU Support via HIP #4
Comments
@pierotofy Do you have a high level suggestion to the path of porting CUDA code to AMD HIP? I'm planning to give it a quick try with ZLUDA. Any thoughts? |
I personally would start from https://rocmdocs.amd.com/projects/HIPIFY/en/latest/ and https://rocm.docs.amd.com/projects/HIP/en/latest/ ZLUDA seems interesting too, but I haven't tried it. |
Quick update: HIP is missing some functions that are used in gsplat's CUDA code. This will require some manual changes as an automatic port does not seem feasible. |
hipify-clang seems more robust than hipify-perl to port CUDA code. I was able to run through these 3 cuda code translation after carefully tweaking cuda, rocm, and clang/llvm versions: os: [ubuntu-22.04]
Let's see if we can compile them into an executable file. To further test the compatibility and performance, I probably need to find a way to hook 2nd GPU (AMD Radeon RX 6700 XT) into my mini desktop. |
I had trouble with the functions in Super experimental, proof of concept branch https://github.com/pierotofy/OpenSplat/tree/hippoc |
Unfortunately, hipify-clang generates a very similar code as hipify-perl did. But it seems like most functions can be directly mapped into HIP except cg::reduce() and cg::this_thread_block(). There should be a way to manually map them: auto block = hipThreadIdx_x + hipBlockDim_x * hipBlockIdx_x;
int32_t tile_id =
hipBlockIdx_y * tile_bounds.x + hipBlockIdx_x;
unsigned i =
hipBlockIdx_y * hipBlockDim_y + hipThreadIdx_y;
unsigned j =
hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x;
inline __device__ void warpSum(float& val, const int& tile){
for (int offset = tile / 2; offset > 0; offset /= 2) {
val += __shfl_down_sync(0xffffffff, val, offset);
}
} |
BTW, ROCm HIP build pipeline is ready. What's the best way to integrate it with your working branch (hippoc)? |
@pierotofy Based-on your current work (hippoc), I created a new PR#36 to consolidate our updates. It would be great to have an actual test. From my end, it might take time for me to find a way to hook up an extra AMD GPU into my desktop. It already has a Nvidia 3080 ti up running and the chassis space is very tight right now. BTW, if you can find some AMD/Nvidia GPU resources, building a self-hosted GPU runner could be a good option for daily build & benchmark tasks. |
I don't have an AMD GPU either, but I was thinking of using https://github.com/ROCm/HIP-CPU to test. |
It's a pretty interesting project and could become useful for Mac port. Thank you for pointing it out. |
Found that HIP-CPU does not have the cooperative_groups.h header, so the entire cg namespace is missing, which is problematic, since there's a lot of references for that in the gsplat implementation. For CPU support I would either have to refactor gsplat to not use cooperative groups or rewrite the entire thing for CPU (or wait and hope for HIP-CPU support: ROCm/HIP-CPU#60) |
AMD GPU is now working 🚀 |
It should be possible to port the CUDA code to HIP and run this on AMD GPUs as well.
The text was updated successfully, but these errors were encountered: