-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely slow respone #42
Comments
I've been able to get better (although still not great) response times by going into the settings (the gear thing), and increasing the self.threads from the default 4 (I think that's what it was) to 10-13. The responses then use a LOT more CPU power and the responses are (slightly) faster. Still like 30-40 sec. |
I also tried making the thread count higher but it did not change much. The program generates like 2 words a minute. It uses ~60% CPU for like 20 seconds and then it drops to between 1% and 5%. There are no issues with cooling, I checked that. It also just uses 2 GB of ram. I use AMD Ryzen 7 5800x with 16 GB of RAM. I run it with docker on windows 11 with debian wsl. |
Amazed it's running at all, so thank you! But yes, it's unusably slow even when I give it more threads. Model Name: MacBook Pro |
same here...very slow response 120-300sec.
I've changed Threads too but it's just saving a few seconds. Still generating word for word after seconds ;-) |
After getting rid of all the other issues (see other issue tickets for "models could not be loaded due to localhost issue" and "only a specific model can be used") I finally managed to get alpaca-turbo running.
But if I type I question, it takes over 130 seconds to reply with only the fraction of a word. After around 210 seconds the first sentence was finally completed.
The docker image is running on a server with 32GB of RAM and 16 CPU cores. They are far from being stressed or so. (RAM usage 2,9 GB, CPU 25%).
What could be the issue?
The text was updated successfully, but these errors were encountered: