Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Documentation]: Installation. ( hipcc-nvidia ) #3479

Open
MarkCotch opened this issue May 9, 2024 · 9 comments
Open

[Documentation]: Installation. ( hipcc-nvidia ) #3479

MarkCotch opened this issue May 9, 2024 · 9 comments

Comments

@MarkCotch
Copy link

Description of errors

This documentation seems to lack proper information about external dependencies for HIP. Specifically around Repositories. I went to install HIP with NVidia. At no point did it specify installation of the Rocm repos.
With Rocm repos installed I am still finding the necessary dependencies missing i.e.:

hip-runtime-nvidia : Depends: hipcc-nvidia but it is not installable

I am unable to find any repository that hosts the "hipcc-nvidia" package. I have checked repositories that I know of at AMD, Radeon, and Nvidia without success.

Attach any links, screenshots, or additional evidence you think will be helpful.

No response

@MarkCotch MarkCotch changed the title [Documentation]: Installation. [Documentation]: Installation. ( hipcc-nvidia ) May 9, 2024
@felixcool200
Copy link

I encountered this issue as well.

I am trying to get it to install inside a docker container:

# Use the official ROCm development image
FROM rocm/dev-ubuntu-22.04:latest

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV HIP_PATH=/opt/rocm/hip
ENV CUDA_PATH=/usr/local/cuda
ENV PATH=$HIP_PATH/bin:$CUDA_PATH/bin:$PATH
ENV LD_LIBRARY_PATH=$CUDA_PATH/lib64:$HIP_PATH/lib:$LD_LIBRARY_PATH

# Install necessary packages
RUN apt-get update && \
    apt-get install -y \
        cmake \
        git \
        wget \
        mesa-utils \
        freeglut3-dev \
        libglu1-mesa-dev \
    && rm -rf /var/lib/apt/lists/*

#Install CUDA
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
    sudo dpkg -i cuda-keyring_1.1-1_all.deb

#RUN apt update && \
#    apt install software-properties-common -y && \
#    add-apt-repository ppa:graphics-drivers && \
#    apt install nvidia-driver-535 -y

RUN apt-get update && \
    apt-get install -y \
    cuda \
    nvidia-gds \
    hip-runtime-nvidia \
    hip-dev \
    && rm -rf /var/lib/apt/lists/*

# Create a working directory
WORKDIR /usr/local/src

# Copy the code to the container
COPY ./ /usr/local/src/MYAPP

# Set the XDG_RUNTIME_DIR environment variable
ENV XDG_RUNTIME_DIR=/tmp/xdg
RUN mkdir -p /tmp/xdg && chmod 700 /tmp/xdg

# Compile the HIP program with OpenGL/GLUT support
RUN cd /usr/local/src/MYAPP && \
    /opt/hip/bin/hipcc \
    -o MYAPP \
    -I /usr/local/src/MYAPP/include \
    src/kernels.hip src/main.cpp \
    -lGL -lGLU -lglut -Wall -Werror -O3

# Set the entry point to run the compiled HIP program
ENTRYPOINT ["/usr/local/src/MYAPP/MYAPP"]

@szellmann
Copy link

I'm running into the same issue, it seems to me that the dependency chain of the packages is broken and am really rooting for a fix :-)

@ppanchad-amd
Copy link

@MarkCotch Internal ticket has been created to fix this issue. Thanks!

@szellmann
Copy link

To add to this, the issue seems not to be documentation only, but a broken package dependency on hipcc-nvidia that doesn't seem to exist anymore. This is with a fresh ROCm install on Ubuntu 22.04, following the ROCm documentation on installing 6.1.1.

I agree though the documentation could be improved. I realize NVIDIA is probably not that well supported, though really helpful for developing. For example, following all the instructions (by following the link to "install ROCm in general"), the docs want me to install amdgpu-dkms even on an NVIDIA system, which obviously doesn't make much sense, so a bit more guidance on this would be appreciated (though understandable if not the primary focus of this project :-) )

@DucHUNG312
Copy link

I also got this error, and I had to build HIP from source

@ReeRichard
Copy link

I also got this error, and I had to build HIP from source

Interesting, and this worked for you? Was this with the latest release 6.1.2? Because with 6.1.1 this was not working either.

@ppanchad-amd
Copy link

Fix will be available in the future ROCm 6.2 release. Thanks!

@elsampsa
Copy link

As of today, I'm trying with ROCm 6.2, with no luck.

As I understand, the hipcc command should do the whole process of churning hip -> cuda and then evokes nvcc, etc. (did I get this right?)

But the problem seems to be that the --cuda-gpu-arch (which seems to be a simple alias to --offload-arch), doesn't work:

hipcc --cuda-gpu-arch=sm_80 /home/sampsa/amd/doodles/hip_doodle0/src/vector_add.hip -o pska.o
clang++: error: invalid target ID 'sm_80'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')

If I change to --cuda-gpu-arch=gfx908 (because that's just an alias to --offload-arch=gfx980) it start compiling at least.

Related: ROCm/ROCm#2975

So to conclude: --offload-arch is broken?

@han-minhee
Copy link

As of today, I'm trying with ROCm 6.2, with no luck.

As I understand, the hipcc command should do the whole process of churning hip -> cuda and then evokes nvcc, etc. (did I get this right?)

But the problem seems to be that the --cuda-gpu-arch (which seems to be a simple alias to --offload-arch), doesn't work:

hipcc --cuda-gpu-arch=sm_80 /home/sampsa/amd/doodles/hip_doodle0/src/vector_add.hip -o pska.o
clang++: error: invalid target ID 'sm_80'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')

If I change to --cuda-gpu-arch=gfx908 (because that's just an alias to --offload-arch=gfx980) it start compiling at least.

Related: ROCm/ROCm#2975

So to conclude: --offload-arch is broken?

I experienced similar issues, and I'm honestly not sure what fixed it. I have Ubuntu 22.04.5 with ROCm 6.2. The things I've tried:

  1. Installing ROCm 6.2 again from the repository after purging everything
  2. Installing the latest CMake from the Kitware repository
  3. Setting environment variables in the ~/.bashrc (HIP_PLATFORM=nvidia, HIP_COMPILER=nvcc, HIP_RUNTIME=cuda
  4. Getting hipblas from the GitHub repository (I'm not sure if it's relevant)

I'm trying to reproduce the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants