-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem detecting ROCm version #2
Comments
Hi, sorry for the delay. I'll make some changes so that you can specify -DROCM_ROOT at command line. Will that work for you? Does your ROCm installation directory have .info directory in it? One thing that ROCm .deb packages do recently (not sure which one in particular, probably rocm-dkms) is that they soft link /opt/rocm to the latest version, such as /opt/rocm-4.0 . This obviously doesn't happen in your installation process. |
Yes, that works
Yes, it does.
There are several versions of ROCm installed on the cluster so you need to load a module to get the version you want. That's why /opt/rocm does not exist. |
Please try the latest master branch and run cmake with |
Also, build instructions have changed a little. You can try |
This works, I can configure the code. However, this new line breaks at compile time. I don't have a
I guess I need to build my own ATMI. |
Like I mentioned, ATMI build is now automated via DAGEE's cmake setup, so please try -DATMI_SRC option with cmake. |
That doesn't work. Here is my configuration line:
and here is the error:
It looks like the variables are not passed correctly to ATMI. The first part of the configuration finds |
OK, I see what's going on. ATMI is looking for libhsa-runtime but it doesn't know where rocm is installed. It looks for ROCM_DIR (which defaults to /opt/rocm). See https://github.com/AMDResearch/atmi/blob/15ab2af651a6a394d37e080bfee3735fcaeb6d7b/src/CMakeLists.txt#L42 As a quick fix, can you try adding -DROCM_DIR on command line (same value as ROCM_ROOT). Just trying to see if atmi will pick it up. |
Still doesn't work and the error message is not helping:
|
OK, looks like we made some progress. I'll try to replicate your setup and hopefully that'll expose more problems. Our scripts do rely on stuff being in /opt/rocm, hence the problems being exposed when that's no longer true. Can you please confirm if you have the comgr library? It's two pieces. ROCM_ROOT/include/amd_comgr.h and ROCM_ROOT/lib/libamd_comgr.so Thanks. |
I have the library but the shared library is in I don't know if that helps but the OS is Red Hat 8.2 |
That definitely helps. Turns out, our cmake module scripts don't look in ROCM_ROOT/lib64 because it doesn't exist in Ubuntu. Stepping back a bit, you might be able to live with not compiling the atmiDenq target. Can you please try commenting out Line 15 in 9031451
You might still run into libcomgr and libhsakmt linking issues, but perhaps you can resolve those with LD_LIBRARY_PATH or LD_PRELOAD. Worth a shot I think. In the meanwhile, I'll work on making our scripts more flexible. |
We are working on fixing the plumbing of cmake vars HSA_ROOT/ROCM_ROOT/ROCM_DIR from DAGEE->ATMI. Clearly there were a few bugs where we were assuming default ROCm paths. Thanks for your patience. However, there may still be some hardcoded cmake paths in some dependent ROCm libraries (like comgr) for
|
Today I've updated DAGEE and the initial ROCm version problem reappeared. It looks like there was a problem when you apply my PR https://github.com/AMDResearch/DAGEE/pull/3/files. My PR only adds |
Sorry about that. I must have messed up the versioning in the internal repo's submodules. Please check: 50a9217 |
I get the following error when trying to configure DAGEE:
The problem is that this line https://github.com/AMDResearch/DAGEE/blob/master/cmakeUtils/rocmVersion.cmake#L8 assumes that the file is in
/opt/rocm/
. This is not the case on cluster I am working on. We have multiple version of rocm in/opt/rocm-XXX
. Changing the path incmakeUtils/rocmVersion.cmake
fixes the problem.The text was updated successfully, but these errors were encountered: