You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running JAX on a shared server for which I don't have admin rights. For some time, I could run the scripts on the GPUs without problems. But recently (I can't tell if there was any update on the server), the scripts failed with the following error: RuntimeError: CUDA operation failed: cudaGetErrorString symbol not found.
CUDA 11.0 is installed on the server:
$ ls -l /usr/local/cuda
lrwxrwxrwx 1 root root 9 Aug 11 2020 /usr/local/cuda -> cuda-11.0
I installed the right versions of JAX and JAXLIB:
$ pip show jax
Name: jax
Version: 0.2.9
$ pip show jaxlib
Name: jaxlib
Version: 0.1.61+cuda110
A minimal script to reproduce the error is the following one:
from jax import random
rng_key = random.PRNGKey(42)
rng_key, init_key = random.split(rng_key)
Before running the script I export the location of the CUDA install: export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm running JAX on a shared server for which I don't have admin rights. For some time, I could run the scripts on the GPUs without problems. But recently (I can't tell if there was any update on the server), the scripts failed with the following error:
RuntimeError: CUDA operation failed: cudaGetErrorString symbol not found.
CUDA 11.0 is installed on the server:
I installed the right versions of JAX and JAXLIB:
A minimal script to reproduce the error is the following one:
Before running the script I export the location of the CUDA install:
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda
What's your advice to further debug this ?
Beta Was this translation helpful? Give feedback.
All reactions