-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ibucc_tl_cuda.so: undefined symbol: nvmlDeviceGetNvLinkRemoteDeviceType #496
Comments
@bureddy Can you take a look, please. |
Hi @zasdfgbnm actually existing autotool code does check for the presence of that function at compile time. Here: Line 79 in e96a6de
|
We're seeing the undefined symbol message when we run a container which has CUDA 11.6 on a host with an older driver |
what is the driver version? is it possible to choose the right cuda toolkit version in container? |
The KMD was 460.73.01, UMD 510.47.03, and forward compat was used. |
It seems no forward compat for NVML (libnvidia-ml.so) unfortunately. |
@bureddy what do you think about @zasdfgbnm's 2nd question?
|
I am seeing this error:
Thanks to @crcrpar who figured out that this is a new API https://github.com/NVIDIA/nvidia-settings/blame/5b455b89bb73f56818c84444806bc9c928da67ac/src/nvml.h#L6009-L6026
For older versions of drivers, is it possible to use other APIs to achieve similar functionality? Or at least detect the version and throw a kinder error message?
cc: @ptrblck
The text was updated successfully, but these errors were encountered: