Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run desktop on Windows Subsystem for Linux #22

Open
j-p-e opened this issue Mar 21, 2022 · 17 comments
Open

Run desktop on Windows Subsystem for Linux #22

j-p-e opened this issue Mar 21, 2022 · 17 comments

Comments

@j-p-e
Copy link

j-p-e commented Mar 21, 2022

thanks for these excellent resources

is the glx one working ok at the moment ? It seems with the latest docker and up to date card (3080TI) when running on a windows host with --gpus all or --gpus 1 there are some faults:
on startup :
/proc/driver/nvidia/version is no longer present so DRIVER_VERSION can't be found
if you manually set DRIVER_VERSION to e.g. 510.47.03 - then the nvidia installer fails because the following files are preloaded into the container : libnvidia-ml.so.1, libcuda.so.1, libnvcuvid.so.1, libnvidia-encode.so.1, libnvidia-opticalflow.so.1

finally - if you hack all those so it runs then the nvidia-xconfig command currently there gives 'no screens found' (though I suspect at this point any xconfig command would do the same)

@ehfd
Copy link
Member

ehfd commented Mar 21, 2022

Hi,

You have stated that you are running a Windows host. Does this mean that you are using the Windows Subsystem for Linux, or actually starting a container on Windows? The latter will be impossible as this is a Linux container, and the earlier is untested but might be possible (I have no such setup).

then the nvidia installer fails because the following files are preloaded into the container : libnvidia-ml.so.1, libcuda.so.1, libnvcuvid.so.1, libnvidia-encode.so.1, libnvidia-opticalflow.so.1

This is not a big issue because the installation completes correctly even if this happens normally.

if you hack all those so it runs then the nvidia-xconfig command currently there gives 'no screens found' (though I suspect at this point any xconfig command would do the same)

This is unfortunately unavailable in containers, hence I have scripted a custom script in entrypoint.sh which uses nvidia-smi instead of nvidia-xconfig to configure Xorg.conf. It was worked around.

is the glx one working ok at the moment ? It seems with the latest docker and up to date card (3080TI) when running on a windows host with --gpus all or --gpus 1 there are some faults

Could you post the logs which are located in /tmp inside the container (docker exec) to troubleshoot further?

@j-p-e
Copy link
Author

j-p-e commented Mar 22, 2022

I was trying it as a linux container on windows (docker desktop in linux container mode, with WSL2 back end) - can you advise why that’s impossible (or whether it should work) ? I am also trying some things running docker-ce on an ubuntu WSL2 host which still has no screens found for xorg, but seems to get a better opengl result (with windows running x server instead)

@ehfd
Copy link
Member

ehfd commented Mar 24, 2022

Windows Subsystem for Linux might work. But the NVIDIA container runtime must be configured properly.
However, this is a totally untested territory and I don't know where to start.
I would really want to see this work though...

Could you post the full .log files in /tmp after starting the container then executing a container shell process with docker exec?

And also, please DO NOT start an X server with Windows Subsystem for Linux. The container has to start its own X server automatically instead.

@ehfd
Copy link
Member

ehfd commented Apr 16, 2022

Any luck? I was unable to prepare a setup using NVIDIA GPUs and WSL yet...

@ehfd ehfd changed the title not running with gpus -all and latest nvidia drivers Run desktop on Windows Subsystem for Linux Jun 14, 2022
@ehfd
Copy link
Member

ehfd commented Jun 15, 2022

This will be addressed when possible. Please hold tight.
Meanwhile, this use case is adequate for https://github.com/ehfd/docker-nvidia-egl-desktop.

If anyone succeeded in using either desktops with WSL, please share your experiences.

@ehfd
Copy link
Member

ehfd commented May 7, 2023

Ah, god. Did not have time.

@ehfd
Copy link
Member

ehfd commented May 8, 2023

It was in my personal to do list all the time, and I will try to look into it.

@ehfd
Copy link
Member

ehfd commented Nov 9, 2023

It's going to become a bit easier because I eliminated the CUDA runtime in the containers.

@Umar-Azam
Copy link

I am also interested in using this with WSL v2, as of now It doesn't seem to be able to connect to a screen. NoVNC works but has an awful framerate.

@ehfd
Copy link
Member

ehfd commented Jan 5, 2024

This is like the hundredth time I said this and I know that I'm going to procrastinate again, but I'll try my best to make it work.

@djpremier
Copy link

djpremier commented Jun 9, 2024

@ehfd The EGL docker work like charm 😀

image

image

I'm going to do some tests to see if I can make it work with Xorg instead of Xvfb (I understand that this is what really differentiates between one and the other). But if there is any other way that can help in your attempts, I am at your disposal.

@ehfd
Copy link
Member

ehfd commented Jun 23, 2024

Could someone post their:

nvidia-smi --version
nvidia-smi
nvidia-smi --query-gpu=driver_version --format=csv,noheader

Outputs inside WSL as soon as possible?

@ehfd
Copy link
Member

ehfd commented Jun 23, 2024

# Install NVIDIA userspace driver components including X graphic libraries
if ! command -v nvidia-xconfig >/dev/null 2>&1; then
  # Driver version is provided by the kernel through the container toolkit
  export DRIVER_ARCH="$(dpkg --print-architecture | sed -e 's/arm64/aarch64/' -e 's/armhf/32bit-ARM/' -e 's/i.*86/x86/' -e 's/amd64/x86_64/' -e 's/unknown/x86_64/')"
  if [ -z "${DRIVER_VERSION}" ]; then
    # If kernel driver version is available, prioritize first
    if [ -f "/proc/driver/nvidia/version" ]; then
      export DRIVER_VERSION="$(head -n1 </proc/driver/nvidia/version | awk '{for(i=1;i<=NF;i++) if ($i ~ /^[0-9]+\.[0-9\.]+/) {print $i; exit}}')"
    # Otherwise, use the NVML version for compatibility with Windows Subsystem for Linux
    elif command -v nvidia-smi >/dev/null 2>&1; then
      export DRIVER_VERSION="$(nvidia-smi --version | grep 'NVML version' | cut -d: -f2 | tr -d ' ')"
    else
      echo "Failed to find NVIDIA GPU driver version. You might not be using the NVIDIA container toolkit. Exiting."
      exit 1
    fi
  fi

I've edited to use the NVML version (= the userspace library version) when the kernel driver version is unavailable. Hopefully, this might fix the driver installation with WSL. Please test.

@djpremier
Copy link

Hi

$ nvidia-smi --version
NVIDIA-SMI version  : 555.42.03
NVML version        : 555.42
DRIVER version      : 555.85
CUDA Version        : 12.5
$ nvidia-smi
Sun Jun 23 14:54:16 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080 Ti     On  |   00000000:09:00.0  On |                  N/A |
| 53%   46C    P0             89W /  350W |    2710MiB /  12288MiB |     17%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A       401      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+
$ nvidia-smi --query-gpu=driver_version --format=csv,noheader
555.85

@ehfd
Copy link
Member

ehfd commented Jun 24, 2024

...:~$ nvidia-smi --version
NVIDIA-SMI version  : 550.76.01
NVML version        : 550.76
DRIVER version      : 552.22
CUDA Version        : 12.4
...:~$ nvidia-smi
Sun Jun 23 21:43:56 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.76.01              Driver Version: 552.22         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
Segmentation fault
...:~$ nvidia-smi --query-gpu=driver_version --format=csv,noheader
552.22
...:~$ nvidia-smi
Mon Jun 24 15:05:49 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
Segmentation fault

Windows:

nvidia-smi
Mon Jun 24 15:08:22 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.85                 Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 3000 Ada Gene...  WDDM  |   00000000:01:00.0 Off |                  Off |
| N/A   51C    P3             10W /   43W |       0MiB /   8188MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Looks like the issue is very complicated. 555.42.03 isn't even in https://download.nvidia.com/XFree86/Linux-x86_64/ and NVML cuts the last digits off.

@ehfd
Copy link
Member

ehfd commented Jun 24, 2024

NVIDIA/nvidia-container-toolkit#563

I feel there isn't much I can do here right now. Opened an issue for the root cause.

docker-nvidia-egl-desktop will still work.

@ehfd
Copy link
Member

ehfd commented Jun 24, 2024

I've enabled NVIDIA_DRIVER_VERSION, and things could possibly work out if you set the NVIDIA_DRIVER_VERSION to xxx.xx.01 down what it shows on nvidia-smi (such as if nvidia-smi is 550.76.01, set to 550.76, if nvidia-smi is 555.42.03, set to 555.42.02).

However, working behavior is not guaranteed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants