Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why does nvidia-container-cli load libnvidia-ml via dlopen rather than linking directly? #223

Open
deitch opened this issue Oct 3, 2023 · 4 comments

Comments

@deitch
Copy link

deitch commented Oct 3, 2023

Why does nvidia-container-cli load libnvidia-ml via dlopen rather than linking directly? It uses dlopen(), so it has to find it in the path. This creates a few issues:

  • ldd does not tell me of the dependencies, nor does compiling from source indicate that I will need it
  • if it is not available on the system at runtime, I will be stuck, but not know it

If I am running other than a pre-installed OS with that package, I am stuck. And there are lots of custom OS builds there, or versions of an OS, etc.

Separately, if I did want to install it, how do I get it for other OSes, e.g. musl-based like Alpine? Or build from source? I managed to get everything in this repo built from source, including on Alpine, but it fails on run because of that libnvidia-ml dependency.

@ubuntuyeah
Copy link

What OS did you build on and what packages did you install?

@deitch
Copy link
Author

deitch commented Oct 8, 2023

I did the compile on an ubuntu-based system, but I plan on using it on Alpine as well as possibly a custom-composed OS, so possibly no package manager.

@SomeoneSerge
Copy link

SomeoneSerge commented Jan 8, 2024

Separately, if I did want to install it, how do I get it for other OSes, e.g. musl-based like Alpine?

I think this might be a larger-scale issue actually: since NVidia distributes most of the drivers and libraries in the binary form, most of those also only come linked against glibc? E.g. the cuda libraries, except for some chosen jetson platforms?

dlopen rather than linking directly?

I'm not sure this would work in case of libnvidia-ml.so, because it's part of the "userspace driver" and pins the kernel module version?

❯ strings /run/opengl-driver/lib/libnvidia-ml.so | grep "API mismatch"
NVIDIA: API mismatch: the NVIDIA kernel module has version %s,
NVIDIA: API mismatch: this NVIDIA driver component has version

Unlike libcuda.so, libnvidia-ml.so also comes without the "stub" libraries I believe

@deitch
Copy link
Author

deitch commented Jan 9, 2024

except for some chosen jetson platforms?

Actually, the Jetson platform (the "official OS", anyways) is glibc-based.

I'm not sure this would work in case of libnvidia-ml.so, because it's part of the "userspace driver" and pins the kernel module version?

How interesting. libnvidia-ml.so is pinned to a specific kernel version? As you point out, that is userspace, which usually is the kind of thing that is not kernel version pinned.

Unlike libcuda.so, libnvidia-ml.so also comes without the "stub" libraries I believe

What do you mean?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants