Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: rocm/6.2.0 installation from source on NVIDIA (Perlmutter machine) #3570

Closed
rgayatri23 opened this issue Aug 12, 2024 · 11 comments
Closed

Comments

@rgayatri23
Copy link

Problem Description

I was following the commands to install hip using the instructions provided here
I get the following issue

cmake -DHIP_COMMON_DIR=/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip -DHIP_PLATFORM=nvidia -DCMAKE_INSTALL_PREFIX=/global/cfs/cdirs/nstaff/rgayatri/software/hip/clr/build/build/install -DHIP_CATCH_TEST=0 -DCLR_BUILD_HIP=ON -DCLR_BUILD_OCL=OFF -DHIPNV_DIR=/global/cfs/cdirs/nstaff/rgayatri/software/hip/hipother/hipnv ..
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is GNU 12.3.0
-- Cray Programming Environment 2.7.30 C
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/cray/pe/craype/2.7.30/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Cray Programming Environment 2.7.30 CXX
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/cray/pe/craype/2.7.30/bin/CC - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- HIPCC Binary Directory: /opt/rocm/bin
CMake Error at CMakeLists.txt:51 (message):
  Please pass hipcc/build or hipcc/bin using -DHIPCC_BIN_DIR.

Am I missing a step as I am unsure of why the build is looking for /opt/rocm

Operating System

SLES

CPU

AMD EPYC 7713 64-Core

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.2.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@rgayatri23
Copy link
Author

Edit - The GPU version listed here is not correct. Perlmutter machine has NVIDIA A100 GPU. I had to put a value there in order to submit the issue and the NVIDIA GPUs were not available in the list of options.

@cjatin
Copy link
Contributor

cjatin commented Aug 12, 2024

You need hipcc as well.
can you search any package named hipcc and install it. Point to it via -DHIPCC_BIN_DIR=<dir>

@rgayatri23
Copy link
Author

I am trying to install hipcc via this package. Is that a different package?

@cjatin
Copy link
Contributor

cjatin commented Aug 12, 2024

Its a different package.
you can clone it: https://github.com/ROCm/llvm-project/tree/amd-staging/amd/hipcc and point it to -DHIPCC_BIN_DIR=llvm-project/amd/hipcc/bin dir

@rgayatri23
Copy link
Author

Thanks @cjatin . I am a bit confused at this point. In order to build hipcc, I need to point it to HIPCC_BIND_DIR ?

Additionally, while I was able to build the hipcc compiler using your solution, when I tried a simple test program, I got the following error

hipcc.bin not present; install HIPCC binaries before proceeding

@cjatin
Copy link
Contributor

cjatin commented Aug 14, 2024

so HIPCC can be a perl-script or a cpp application.
I think by default it tries to use the cpp application, you can bypass it by setting env variable : HIP_USE_PERL_SCRIPTS=1

In case you want to build hipcc.bin:
go to llvm-project/amd/hipcc
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=<where to install hipcc>
make -j install

This will install hipcc.bin to the desired location

I would recommend you to build hipcc and then point to the hipcc install directory via -DHIPCC_BIN_DIR while building clr

@rgayatri23
Copy link
Author

I am trying to build a simple test from hip-tests and it looks like hipcc is unable to find cuda. Is there a way to pass the location of nvcc ? It is installed in a non standard location. I tried passing CUDA_TOOLKIT_ROOT_DIR to install and app-compilation but they got ignored.

rgayatri@perlmutter:login40:/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build> cmake -DCMAKE_CXX_COMPILER=hipcc -DCUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME ../
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is unknown
-- Cray Programming Environment 2.7.30 C
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/cray/pe/craype/2.7.30/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - failed
-- Check for working CXX compiler: /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc
-- Check for working CXX compiler: /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc - broken
CMake Error at /global/u1/r/rgayatri/.local/cmake/share/cmake-3.23/Modules/CMakeTestCXXCompiler.cmake:62 (message):
  The C++ compiler

    "/global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_46803/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_46803.dir/build.make CMakeFiles/cmTC_46803.dir/build
    gmake[1]: Entering directory '/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp'
    Building CXX object CMakeFiles/cmTC_46803.dir/testCXXCompiler.cxx.o
    /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc    -o CMakeFiles/cmTC_46803.dir/testCXXCompiler.cxx.o -c /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    sh: /usr/local/cuda/bin/nvcc: No such file or directory
    failed to execute:/usr/local/cuda/bin/nvcc  -Wno-deprecated-gpu-targets  -isystem /usr/local/cuda/include -isystem "/global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/include" -x cu  -o "CMakeFiles/cmTC_46803.dir/testCXXCompiler.cxx.o" -c /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    gmake[1]: *** [CMakeFiles/cmTC_46803.dir/build.make:78: CMakeFiles/cmTC_46803.dir/testCXXCompiler.cxx.o] Error 127
    gmake[1]: Leaving directory '/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp'
    gmake: *** [Makefile:127: cmTC_46803/fast] Error 2





  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:23 (project)


-- Configuring incomplete, errors occurred!
See also "/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeOutput.log".
See also "/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeError.log".
rgayatri@perlmutter:login40:/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build> which nvcc
/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc

@scchan
Copy link
Contributor

scchan commented Aug 14, 2024

Could you try setting the env var CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 and see if that works around the non-standard location?

@rgayatri23
Copy link
Author

Could you try setting the env var CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 and see if that works around the non-standard location?

No that did not help.

@rgayatri23
Copy link
Author

rgayatri23 commented Aug 16, 2024

Update:
I was able to compile the app with regular compilation using hipcc square.cu -o square.ex but the CMake build fails with a different error
The error is nvcc fatal : Unknown option '-rdynamic'

rgayatri@perlmutter:login40:/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build> cmake -DCMAKE_CXX_COMPILER=hipcc ../
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is GNU 12.3.0
-- Cray Programming Environment 2.7.30 C
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/cray/pe/craype/2.7.30/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - failed
-- Check for working CXX compiler: /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc
-- Check for working CXX compiler: /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc - broken
CMake Error at /global/u1/r/rgayatri/.local/cmake/share/cmake-3.23/Modules/CMakeTestCXXCompiler.cmake:62 (message):
  The C++ compiler

    "/global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_02053/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_02053.dir/build.make CMakeFiles/cmTC_02053.dir/build
    gmake[1]: Entering directory '/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp'
    Building CXX object CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o
    /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc    -o CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o -c /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    HIP_PATH=/global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0
    HIP_PLATFORM=nvidia
    HIP_COMPILER=nvcc
    HIP_RUNTIME=cuda
    CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2
    hipcc-args: -o CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o -c /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    hipcc-cmd: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc  -Wno-deprecated-gpu-targets  -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/include -isystem "/global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/include" -x cu  -o "CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o" -c /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    Linking CXX executable cmTC_02053
    /global/u1/r/rgayatri/.local/cmake/bin/cmake -E cmake_link_script CMakeFiles/cmTC_02053.dir/link.txt --verbose=1
    /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc -rdynamic CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o -o cmTC_02053
    HIP_PATH=/global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0
    HIP_PLATFORM=nvidia
    HIP_COMPILER=nvcc
    HIP_RUNTIME=cuda
    CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2
    hipcc-args: -rdynamic CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o -o cmTC_02053
    nvcc fatal   : Unknown option '-rdynamic'
    hipcc-cmd: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc  -Wno-deprecated-gpu-targets -lcuda -lcudart -L/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/lib64  -rdynamic CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o -o "cmTC_02053"
    failed to execute:/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc  -Wno-deprecated-gpu-targets -lcuda -lcudart -L/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/lib64  -rdynamic CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o -o "cmTC_02053"
    gmake[1]: *** [CMakeFiles/cmTC_02053.dir/build.make:99: cmTC_02053] Error 1
    gmake[1]: Leaving directory '/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp'
    gmake: *** [Makefile:127: cmTC_02053/fast] Error 2





  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:23 (project)

@darren-amd
Copy link

darren-amd commented Oct 1, 2024

Hi @rgayatri23,

I was able to reproduce your issue when using cmake like cmake -DCMAKE_CXX_COMPILER=hipcc ../ and found that cmake is trying to add the flag -rdynamic when trying to compile with nvcc, which is a GNU flag that is not supported with nvcc. I was able to isolate it to this file: Linux-GNU.cmake from the cmake source code.

However, I was able to get cmake working without passing in hipcc as follows:
cmake -DHIP_COMPILER=nvcc -DHIP_PLATFORM=nvidia -DHIP_RUNTIME=cuda ..
since hipcc should already be installed as part of ROCm (I tested with ROCm 6.2).

Please give that a try and let me know if you run into any issues. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants