-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add initial GPU support #4
base: master
Are you sure you want to change the base?
Conversation
Closes #3 |
Just wanted to leave my 2cents here: (Did not try Piper) |
Piper does not work because of this: rhasspy/rhasspy3#49 |
Whisper is still targeting 20.04 is there a reason for that? |
This may need to be its own image since the majority of users would not want the cuda version |
could this be split into 2 tickets one for whisper and one for piper. The whisper portion is in reality the more useful of the two and benefits more from this feature. If piper is experiencing issues. |
@wdunn001 From the documentation https://github.com/guillaumekln/faster-whisper/ it says it requires cuDNN 8 for CUDA 11, and for those versions of CUDA and cuDNN the highest version of ubuntu available is 20.04, and I had to look for it because it was not working with the image I set for the other containers sadly. |
Sorry, editing because I missunderstood your comment. But I guess for better maintainability the solution we add for one should be the same as for the others, for that is I think is better to have the conversation in a single issue and PR. |
And I'll try to add porcupine1 too |
Awesome! I am happy to help if you need anything. Would we want to add the docker arguments for the CUDA image to the documentation here? |
I added the changes. And yes, ofc we should document this, also I was thinking should we add a docker-compose.yml file? |
But in the README.md file right now there is just the documentation for using it pulling the images, not building them, so that will depend on the tags the maintainer might wanna use. Should we add building instructions to the README.md file? |
I think so for sure we can create a contributors section. I'll work on it I will be building it for the first time this weekend so I'll try and document the process. |
I will give you the docker-compose files and a starting point. |
I just added it, tell me how it works for you, you can create your own docker-compose.x.yml file for your use case. I have not added porcupine1 to the docker compose because it uses the same port as openwakeword, so for that particular case it could be added in the custom extend file. |
ok so I am getting an error deploying this via compose or run usage: main.py [-h] --model {tiny,tiny-int8,base,base-int8,small,small-int8,medium,medium-int8} --uri URI --data-dir DATA_DIR [--download-dir DOWNLOAD_DIR] [--device DEVICE] [--language LANGUAGE] [--compute-type COMPUTE_TYPE] [--beam-size BEAM_SIZE] [--debug] It needs additional params in contrast with the other build. These appear to be supplied by the run.sh file and I see its called in the Dockerfile. I added commands to the GPU compose file identical to those in the NOGPU version and they work fine and made a pr. Its only the ones in the run.sh that seem to not work. I am on Ubuntu 22.04 with latest docker is that matters. |
This is weird, according to the documentation, the only thinks not extended should be |
I needed to add
New to contributing, happy to hear thoughts. |
I rebased with the last chnages from master and the typos in the readme file. I don´t think we need to create another branch for the meanwhile you can just have an extend file where you use GPU options for whisper and openwakeword and nongpu for piper. And regarding /var/data, I am generally against storing user data in a system folder. And passing all the folder to the docker container might load a lot of data that is not needed from other applications. |
@edurenye agreed using cpu for piper seems to be more than sufficient. I am still experiencing issues with openwakeword but it may just be my environment. I'll pull down the changes here and try again. I'll push any fixes I find to the PR on your branch. |
piper/GPU.Dockerfile
Outdated
@@ -0,0 +1,35 @@ | |||
FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we remove this file in the interim to get rid of dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see it as dead code, when this issue gets fixed it should just work right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sounds good
porcupine1/GPU.Dockerfile
Outdated
@@ -0,0 +1,32 @@ | |||
FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove to get rid of deadcode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see it as dead code either, the people that wants to use it can just use it extending the docker compose or use it directly with docker run
as documented here: https://github.com/rhasspy/wyoming-porcupine1/blob/master/README.md but adding the cuda stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
.gitignore
Outdated
@@ -0,0 +1,12 @@ | |||
# OpenWakeWord |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we reference managed volumes instead to prevent this?
i.e.
volumes:
openwakeword-data:
whisper-data:
piper-data:
this is what I did in my version.
we could also add -gpu for volumes connected to gpu enabled instances in the GPU compose file so that we can keep data seperate between instance types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean non binded mounts? But then adding custom models (thinking mainly about OpenWakeWord here) is hard, with binded mounts you can just move the model to that directory. Also I don't think there will be a case where you want to move from GPU to NONGPU changing models, but probably I am wrong there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with you here, probably the best way is to not bind them by default and then you can bind them extending the docker compose and point wherever you have the custom model.
Or maybe we could look at passing it as a parameter, haven't looked into it, I'm still fighting to generate the custom model actually.
docker compose down | ||
``` | ||
|
||
### Run with GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we reference documentation on how to setup docker for gpu? (I can of course add it in a seperate pr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea!
Good finiding! Was not documented, but that parameter exists in https://github.com/rhasspy/wyoming-faster-whisper/blob/master/wyoming_faster_whisper/__main__.py |
Can u resolve the conflicts? I would love to see the improvements from using the GPU directly :) |
Doesn't work with piper since wyoming-piper doesn't declare the |
I agree there @tannisroot, let's start with NVIDIA until we know a good way to support all the other GPUs. I have simplified the files and left all the services that conflict, commented. I think this makes it easier to extend and maintain. Also at some point I'll smash all the commits since they make the rebases quite hard. |
I simplified the PR a lot, now there is no need for having 2 different Dockerfiles for each service, instead I pass the BASE image as an argument, and it seems to work fine. |
BTW I do not know why whisper is still with "debian:bullseye-slim", I know piper had issues, but whisper works fine for me, so I moved it to "debian:bookworm-slim" to simplify things. |
Any way to run it as standalone containers? I don't have GPU in my HA host, but I do on another one. |
Yes @spitfire, I have my Raspberry Pi with HA and another computer where I use this repo with GPU. |
I've tried using instructions from readme on your gpu branch (cloned it from the repo) . I'm getting:
The previous build command seems to have worked fine. I have other containers running on this VM with GPU passed through, and I know they are able to use it (I have nvidia docker runtime installed on it). Edit: Fixed that with:
for some reason other containers using gpu were fine without it. Now I'm getting this though:
|
Oh, that error might be me messing something up in the last commit, I'll take a look. |
Sorry @spitfire I didn't know that the args were not available during run time, now I'm using envs during run time, this should fix the issue, pull the changes and try again please. Also, the base was not working and the individual Dockerfiles neither, now I set default values that do not use GPU. Everything should work again now. |
Pulled last commit:
then stopped (even reran build) and started containers again, but still that happens:
Should the |
Thanks for testing @spitfire, I could run it in my machine, which is weird. Which version of docker and docker compose are you using? Will look into this more tomorrow after work if I have time. But no, the I might be missing something. |
Just upgraded docker to 5:25.0.4-1 |
This was a big facepalm, since I was using the '-d' option and not really using piper or any of the others until now that I received the M5 ATOM ECHO, I didn't see it fail. So I fixed the issue with the entrypoint not accepting env variables directly. Then I realized CUDA was not working, I updated the python files because they were not matching the version 1.5.0 anymore. CUDA was finally working, but then the non GPU was not working, so I ended up splitting the two files again, now both versions work, finally! So, please try again @spitfire 🤞 |
Doesn't crash anymore, but doesn't work either. I've replaced the default voice in base config, but when asked to do TTS it fails like this:
|
Probably that is why my M5 ATOM ECHO follows the order but gets stuck before answering 😞 Well, I do not understand this error, I'll need help with it. This error seems to come from |
My echos had problems with responding even if that was not the case. You could set up the regular piper add-on on HA, change the pipeline to use it and see if that resolves the issue. I've modified my |
What I would recommend you to do is to create a Next thing I'll try for my echo is to use the base service, if that does not work I'll use it in HA, but I don't want to add this kind of things to HA since that would make the system slow or overheat, it's a RPi4. Also, I'll try to upgrade the Ubuntu image, see if that fixes the issue. |
That was the issue for my ECHO, upgrading did not fix it, but does not add more problems, so I think once we fix this issue we can upgrade. At the end I used the base service and I have my ECHO working, but the responses sound awful, probably would sound better with a better model in GPU. Also, ECHO is not the best thing sometimes the speaker makes this sounds like a wire is not fully connected, like electrical sparks. Furthermore, after a while it stops to listen, and I need to plug it off and on again. I think I need a better speaker, even if it's a bit more expensive. I would like to pass the responses to a better Bluetooth speaker directly from Home Assistant, but that will be hard and require time, right now Home Assistant doesn't even allow me to use Bluetooth speakers for playing music... |
Just wanted to chime in and say I got this running in k8s, and everything works fine (including openwake with a custom wake word :) ), except for Piper, I get the same error as the person above where it says no such file or directory: ''. I did some debug and it seems that there's something going on with stdin/stdout that is supposed to take the input text, generate the wave file, then pass it along. Not sure where that is breaking down, but that appears to be the root issue. :) |
I checked out your branch @edurenye and ran the dockerc-compose.gpu.yml. IT started perfectly with the tiny model which is preconfigured. I wanted to run the large-v3 model and so I changed it in both docker compose files (base and gpu) but the container never started and was styling in "Attaching to wyoming-whisper-1" is there anything I did wrong? |
@nikito I been quite busy lately, I'll try to take another look at Piper this summer, but they need to add the support for GPU in the other tickets that are linked in this ticket. @tiko2302 Not sure about the error, but you should not change the base file or the gpu file, instead you should overwrite the model in a new file where you can extend from both files, for example I have this file for Catalan language
As you can see I use the base image for Piper, and GPU for the others. |
Checking in, any updates on this PR? :) |
Sorry @nikito I haven't seen any progress on piper, I'm fighting with local LLMs now, but whisper is still working for me, I moved from external Snowboy to on device micro_wake_word on Atom Echos, but it should still be working fine. This repo did not have any updates neither. |
This is a work in progress.
I think for whisper it is working, but I'm not sure how to check it.
And for piper it is giving me an error
unrecognized arguments: --cuda
, but I got the instructions from here: https://github.com/rhasspy/piper At the end it says that it should work just installingonnxruntime-gpu
and running piper with the--cuda
argument.What am I missing?
I guess this will conflict with those that just want to use the CPU, how can we handle that? Making different images?
Ex: piper and piper-gpu