Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffmpeg with h264_nvenc fails to run on gVisor with -nvproxy #9452

Open
luiscape opened this issue Oct 4, 2023 · 13 comments
Open

ffmpeg with h264_nvenc fails to run on gVisor with -nvproxy #9452

luiscape opened this issue Oct 4, 2023 · 13 comments
Labels
area: gpu Issue related to sandboxed GPU access type: bug Something isn't working

Comments

@luiscape
Copy link
Contributor

luiscape commented Oct 4, 2023

Description

ffmpeg supports video encoding and decoding using NVIDIA GPUs. Here's an example command:

wget -q -O /neoncat.mp4 https://media.giphy.com/media/sIIhZliB2McAo/giphy.mp4 && \
    ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i /neoncat.mp4 -c:a copy -c:v h264_nvenc -b:v 5M /neoncat_out.mp4

Running that command fails on a container started with -nvproxy -nvproxy-docker with the following ffmpeg error:

...
[AVHWDeviceContext @ 0x55d500277300] cu->cuInit(0) failed -> CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS
Device creation failed: -1313558101.
[h264 @ 0x55d500251900] No device available for decoder: device type cuda needed for codec h264.
...

Suggesting that calling cuInit(0) fails.

The same command succeeds in runc, encoding video correctly.

We pass NVIDIA_DRIVER_CAPABILITIES=all to expose the video capability.

Steps to reproduce

Build OCI image, example:

docker build -t ffmpeg-test -f Dockerfile .
FROM nvidia/cuda:12.2.0-devel-ubuntu20.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install wget ffmpeg -y
RUN wget -q -O /neoncat.mp4 https://media.giphy.com/media/sIIhZliB2McAo/giphy.mp4

Then run in system with GPU available.

docker run --rm --runtime=runsc --gpus=all ffmpeg-test ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i /neoncat.mp4 -c:a copy -c:v h264_nvenc -b:v 5M /neoncat_out.mp4

runsc version

runsc version release-20230920.0-21-ge81e0c72a70b
spec: 1.1.0-rc.1
@luiscape luiscape added the type: bug Something isn't working label Oct 4, 2023
@luiscape luiscape changed the title ffmpeg with h264_nvenc fails to run on gVisor with nvproxy ffmpeg with h264_nvenc fails to run on gVisor with -nvproxy Oct 4, 2023
@ayushr2
Copy link
Collaborator

ayushr2 commented Oct 4, 2023

We don't support graphics/video capabilities yet.

@ayushr2 ayushr2 added the area: gpu Issue related to sandboxed GPU access label Oct 4, 2023
@luiscape
Copy link
Contributor Author

luiscape commented Oct 4, 2023

Sounds good. Thank you for letting me know.

Copy link

github-actions bot commented Feb 2, 2024

A friendly reminder that this issue had no activity for 120 days.

@github-actions github-actions bot added the stale-issue This issue has not been updated in 120 days. label Feb 2, 2024
@EtiennePerot EtiennePerot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 2, 2024
@thundergolfer
Copy link
Contributor

thundergolfer commented Aug 31, 2024

@ayushr2 we may take on the work to add the video capability to NVProxy. Many of our customers are running into this limitation when seeking to do GPU-accelerated ffmpeg stuff. Do you have any thoughts or objections before we do?

@ayushr2
Copy link
Collaborator

ayushr2 commented Sep 3, 2024

@thundergolfer We are aligning internally around how to proceed with adding non-CUDA support. Let me get back to you once we have fleshed out the details.

@thundergolfer
Copy link
Contributor

how to proceed with adding non-CUDA support

It'd be the NVIDIA Video Codec SDK that we'd need to support, right?

Please do keep us in the loop :) We'd slotted in this work for mid-September but will of course adjust if it doesn't fit with your plans.

@EtiennePerot
Copy link
Contributor

Please see #10856 which needs to happen before non-CUDA ioctls can be added to nvproxy.

@EtiennePerot EtiennePerot reopened this Sep 4, 2024
@github-actions github-actions bot removed the stale-issue This issue has not been updated in 120 days. label Sep 5, 2024
copybara-service bot pushed a commit that referenced this issue Sep 6, 2024
runsc attempts to emulate nvidia-container-runtime-hook. But it was always
passing "--compute --utility" as driver capability flags to
`nvidia-container-cli configure` command.

Fix runsc to emulate nvidia-container-runtime-hook correctly by parsing
NVIDIA_DRIVER_CAPABILITIES and converting that comma-separated list to flags.

This is in preparation for adding support for non-compute GPU workloads in
nvproxy :)

Updates #9452
Updates #10856

PiperOrigin-RevId: 671644915
copybara-service bot pushed a commit that referenced this issue Sep 6, 2024
runsc attempts to emulate nvidia-container-runtime-hook. But it was always
passing "--compute --utility" as driver capability flags to
`nvidia-container-cli configure` command.

Fix runsc to emulate nvidia-container-runtime-hook correctly by parsing
NVIDIA_DRIVER_CAPABILITIES and converting that comma-separated list to flags.

This is in preparation for adding support for non-compute GPU workloads in
nvproxy :)

Updates #9452
Updates #10856

PiperOrigin-RevId: 671644915
@EtiennePerot
Copy link
Contributor

EtiennePerot commented Sep 7, 2024

Hi,

As per #10856, nvproxy cannot currently accept patches for nvenc/nvdec commands until it supports NVIDIA capability segmentation. @ayushr2 and others have started to work on this and we expect this to be done (at least structurally done, i.e. the nvproxy ABI definitions will support being tagged by driver capabilities) by early october.

This is a bit later than your planned date for starting this. So in the meantime, as part of this work, it would also be great if you could contribute some NVENC/NVDEC regression tests as well, even if broken in gVisor at PR merge time. This is necessary not just for correctness, but also to ensure long-term maintainability as the NVIDIA driver and userspace libraries change. ffmpeg's h264_nvenc can take care of exercising nvenc, so that should definitely be one such test. Is there something similarly simple we can use for nvdec?

@thundergolfer
Copy link
Contributor

Thanks for the reply @EtiennePerot. I've made regression testing the first task under our internal project 👍

copybara-service bot pushed a commit that referenced this issue Sep 9, 2024
runsc attempts to emulate nvidia-container-runtime-hook. But it was always
passing "--compute --utility" as driver capability flags to
`nvidia-container-cli configure` command.

Fix runsc to emulate nvidia-container-runtime-hook correctly by parsing
NVIDIA_DRIVER_CAPABILITIES and converting that comma-separated list to flags.

This is in preparation for adding support for non-compute GPU workloads in
nvproxy :)

Updates #9452
Updates #10856

PiperOrigin-RevId: 671644915
copybara-service bot pushed a commit that referenced this issue Sep 9, 2024
This test does NOT work yet in gVisor.

Updates #9452

PiperOrigin-RevId: 670751228
@EtiennePerot
Copy link
Contributor

EtiennePerot commented Sep 9, 2024

We may be able to reuse gVisor's existing ffmpeg image to avoid creating yet another Dockerfile for this. A regression using it can be as simple as this.

@voidastro4
Copy link

Are there any plans to support GPU workloads in general such as vulkan? And potentially implement virtio-gpu cross-domain Wayland.
We are interested in the aim of mostly replacing crosvm with gvisor.

@ayushr2
Copy link
Collaborator

ayushr2 commented Sep 21, 2024

Yeah Vulkan support is on the roadmap. No ETA yet.

@EtiennePerot
Copy link
Contributor

Once capability segmentation is in, patches welcome :)

copybara-service bot pushed a commit that referenced this issue Sep 24, 2024
runsc attempts to emulate nvidia-container-runtime-hook. But it was always
passing "--compute --utility" as driver capability flags to
`nvidia-container-cli configure` command.

Fix runsc to emulate nvidia-container-runtime-hook correctly by parsing
NVIDIA_DRIVER_CAPABILITIES and converting that comma-separated list to flags.

This is in preparation for adding support for non-compute GPU workloads in
nvproxy :)

Updates #9452
Updates #10856

PiperOrigin-RevId: 671644915
copybara-service bot pushed a commit that referenced this issue Sep 24, 2024
runsc attempts to emulate nvidia-container-runtime-hook. But it was always
passing "--compute --utility" as driver capability flags to
`nvidia-container-cli configure` command.

Fix runsc to emulate nvidia-container-runtime-hook correctly by parsing
NVIDIA_DRIVER_CAPABILITIES and converting that comma-separated list to flags.

This is in preparation for adding support for non-compute GPU workloads in
nvproxy :)

Updates #9452
Updates #10856

PiperOrigin-RevId: 671644915
copybara-service bot pushed a commit that referenced this issue Sep 24, 2024
runsc attempts to emulate nvidia-container-runtime-hook. But it was always
passing "--compute --utility" as driver capability flags to
`nvidia-container-cli configure` command.

Fix runsc to emulate nvidia-container-runtime-hook correctly by parsing
NVIDIA_DRIVER_CAPABILITIES and converting that comma-separated list to flags.

This is in preparation for adding support for non-compute GPU workloads in
nvproxy :)

Updates #9452
Updates #10856

PiperOrigin-RevId: 678064565
copybara-service bot pushed a commit that referenced this issue Oct 4, 2024
This test does NOT work yet in gVisor.

Updates #9452

PiperOrigin-RevId: 670751228
copybara-service bot pushed a commit that referenced this issue Oct 4, 2024
This test does NOT work yet in gVisor.

Updates #9452

PiperOrigin-RevId: 682127429
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: gpu Issue related to sandboxed GPU access type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants