Date: 2020-11-21
software | description |
---|---|
OS | Ubuntu-20.04.1 |
Python | 3.8.6 |
Tensorflow-rocm | 2.3.2 |
hardware | Product Name | ISA | CHIP IP |
---|---|---|---|
CPU | Xeon 2620v3 | ||
GPU | RX580 8G | gfx803(Polaris10) | 0x67df |
- ROCm-3.7+ on gfx803, run tensorflow text classification sample. Tensorflow offical sample could reproduce this issue, almost 90%. https://www.tensorflow.org/tutorials/keras/text_classification
- There are many people get this error, please refer here : ROCm/ROCm#1265
- Dont know yet
- Workaround 1: I rebuild rocBLAS with BUILD_WITH_TENSILE_HOST=false, and the problem dispeared, Maybe the gfx803 r9nano_*.yml is out-of-date? This way caused compiling failure on ROCm-3.9.
- Workaround 2: keep BUILD_WITH_TENSILE_HOST=true, delete library/src/blas3/Tensile/Logic/asm_full/r9nano_Cijk_Ailk_Bljk_SB.yaml, and issue resolved. If I just keep one solution of this file, issue reproduced.
If you installed ROCm-3.9 with gfx803, you will crash on very beginning of running tensorflow or pytorch. Error info as follows:
work@0b7758c3094d:~/test/examples/mnist$ python3 main.py
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
Aborted (core dumped)
rocSPARSE placed CACHED AMDGPU_TARGETS after include Dependencies.cmake, so it is always skipped. The AMDGPU_TARGETS from include Dependencies.cmake is "gfx900;gfx906;gfx908", not including gfx803. So rocSPARSE didnot compile gfx803 binary image.
There are other issues on develop branch of rocSPARSE. We have to switch to rocm-3.9.x branch then update CMakeLists.txt with the patch. https://github.com/ROCmSoftwarePlatform/rocSPARSE/commit/f8791e9b09c4ac6d72f56fb3c6663273dce2aea5#commitcomment-43334853
The issue of develop branch fixed for gfx803 https://github.com/ROCmSoftwarePlatform/rocSPARSE/commit/7de15942cf9fe0fb7db80e0c45ebb4d1e3086668
If you want to compile rocSPARSE manually, can use my forked rocSPARSE repository, the patch had be commited to the default branch.
- Install ROCm https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html#ubuntu
- At least install rocm-dev and rocm-libs
sudo apt install rocm-dev rocm-libs
git clone https://github.com/xuhuisheng/rocSPARSE
cd rocSPARSE
bash install.sh -di