Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

after "ldconfig /usr/local/cuda/lib64" I got the error information #140

Open
Austinzhenghua opened this issue Jun 23, 2021 · 6 comments
Open

Comments

@Austinzhenghua
Copy link

/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.56 is empty, not checked.

Does anyone know, what is wrong? Thanks!

@SmartMapple
Copy link

i got the same issue.

@elezar
Copy link
Member

elezar commented Sep 1, 2022

The issue is that the container image was built using the nvidia-conainter-runtime instead of runc. This causes the NVIDIA Container CLI to mount these files into the container frokm the host and these are then left as zero-byte files.

Could you confirm which image this is?

@SmartMapple
Copy link

The issue is that the container image was built using the nvidia-conainter-runtime instead of runc. This causes the NVIDIA Container CLI to mount these files into the container frokm the host and these are then left as zero-byte files.

Could you confirm which image this is?

i use nvidia-container-toolkit instead of the nvidia-conainter-runtime, but i think maybe is actually caused by the problem you mentioned. i can canfirm the image which i use. how can i solved this problem? thanks for you help.

@elezar
Copy link
Member

elezar commented Sep 1, 2022

If you can rebuild the image, you should be able to rebuild without the NVIDIA container runtime (also installed as part of the nvidia-container-toolkit package) and have this issue resolved.

If rebuilding the image is not possible, remove the /usr/lib/x86_64-linux-gnu/*.so.418.56 files from the image and repush / retag it.

@SmartMapple
Copy link

If you can rebuild the image, you should be able to rebuild without the NVIDIA container runtime (also installed as part of the nvidia-container-toolkit package) and have this issue resolved.

If rebuilding the image is not possible, remove the /usr/lib/x86_64-linux-gnu/*.so.418.56 files from the image and repush / retag it.

thanks. let me try.

@Davidrjx
Copy link

Davidrjx commented Jul 12, 2023

If you can rebuild the image, you should be able to rebuild without the NVIDIA container runtime (also installed as part of the nvidia-container-toolkit package) and have this issue resolved.

If rebuilding the image is not possible, remove the /usr/lib/x86_64-linux-gnu/*.so.418.56 files from the image and repush / retag it.

@elezar why? i saw that host with the container has libnvidia-ml.so but refer to libnvidia-ml.so.<nv driver version> as follows

lrwxrwxrwx 1 root root      17 Feb 18 15:53 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so -> libnvidia-ml.so.1
lrwxrwxrwx 1 root root      25 Feb 18 15:53 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 -> libnvidia-ml.so.525.78.01
-rwxr-xr-x 1 root root 1798712 Feb 18 15:53 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.78.01

while container can not find matched libnvidia-ml.so, error like

NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

and mount point in container shows

/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libcuda.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libcudadebugger.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants