-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Launching GPU with nvidia runtime #284
Comments
Hey @aimran-adroll , I suspect that the answer is "yes" although you might also be interested in recent GPU developments in Coiled in the last couple months (package sync works, better GPU metrics, etc..). If you're game, it might be good to have you talk to @jrbourbeau who did a bunch of this work. I'll bet that he could point you in some fruitful directions. If that's interesting send me a note offline and we'll set something up. cc'ing @ntabris to give the definitive "yes that's fine" to your stated question though |
Yes, that's fine. The VMs have NVIDIA Container Toolkit and you can use containers that see and use GPU with NVIDIA driver + CUDA. |
FYI this doc says what our docker run command needs, so you can validate container locally if you want. |
This little dockerfile did not work
Locally it passed the check that @ntabris mentioned
Command to launch notebook
Gist of the error
|
Ah, sorry, this isn't easy to spot but I think the problem is with mismatch between image and VM arch. When I dig in to the (not super easy to find) logs, I see this:
|
Thanks for the quick debugging. 🚀 aside: We need a cloud startup that lets you modify/build/push docker image in the cloud on just the right machine 😄 Once you are done pushing out 7GB image over residential network, I have forgotten what I wanted to do in the first place |
I'd be curious to learn more about why you want to use Docker in the first place. My guess is that either there a piece of software that you're trying to distribute that isn't in a convenient conda repository, or that it's just very culturally entrenched. If that wasn't the reason, I'd probably want to question the choice of Docker and see if there is some other approach we could facilitate. |
Great question. Its a fairly typical workflow for us/me. I want to try a new ML (or whatever) package. I have no idea what the dependancies are (esp because it involves cuda, magical mix of different packages). The exact source recipe is not always easy to track down. I also have to weigh the upfront time investment. In these scenarios, a docker container is a perfect answer to my conundrum -- quick and easy to evaluate something new |
So, for common ML packages (PyTorch, TensorFlow, XGBoost, ...) we've been teaching package sync how to do the translation between CPU and GPU versions. So if your package is mostly depending on those (say you want to use some huggingface transformers package) then the answer is that you just conda install it on your local machine and then have Coiled spin up a cluster with GPUs attached. Coiled notices the change in architecture, swaps out the relevant packages, and has the conda solver fill in any gaps. It's pretty magical. If there was some other baseline GPU package that you needed (say, Jax) that didn't already have this treatment then we could add it. The main reason to not use package sync in this case is if there is some GPU package for which there is no CPU equivalent, and that you couldn't install on a non-GPU machine. |
wow. that does sound magical 🏃🏽♂️ trying it now |
I would like to be able to launch notebooks using containers with
nvidia
runtime.It'd be good to know if its supported before I spend time preparing an image with additional dask requirements
The text was updated successfully, but these errors were encountered: