Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow respone #42

Open
auxilio-ab opened this issue Apr 6, 2023 · 4 comments
Open

Extremely slow respone #42

auxilio-ab opened this issue Apr 6, 2023 · 4 comments

Comments

@auxilio-ab
Copy link

After getting rid of all the other issues (see other issue tickets for "models could not be loaded due to localhost issue" and "only a specific model can be used") I finally managed to get alpaca-turbo running.

But if I type I question, it takes over 130 seconds to reply with only the fraction of a word. After around 210 seconds the first sentence was finally completed.

The docker image is running on a server with 32GB of RAM and 16 CPU cores. They are far from being stressed or so. (RAM usage 2,9 GB, CPU 25%).

What could be the issue?

@aalbrightpdx
Copy link

I've been able to get better (although still not great) response times by going into the settings (the gear thing), and increasing the self.threads from the default 4 (I think that's what it was) to 10-13. The responses then use a LOT more CPU power and the responses are (slightly) faster.

Still like 30-40 sec.

@bendeguzszkalka
Copy link
Contributor

I also tried making the thread count higher but it did not change much. The program generates like 2 words a minute. It uses ~60% CPU for like 20 seconds and then it drops to between 1% and 5%. There are no issues with cooling, I checked that. It also just uses 2 GB of ram. I use AMD Ryzen 7 5800x with 16 GB of RAM. I run it with docker on windows 11 with debian wsl.

@wolfmcnally
Copy link

Amazed it's running at all, so thank you! But yes, it's unusably slow even when I give it more threads.

Model Name: MacBook Pro
Model Identifier: MacBookPro17,1
Model Number: MJ123LL/A
Chip: Apple M1
Total Number of Cores: 8 (4 performance and 4 efficiency)
Memory: 16 GB

@BangerTech
Copy link

same here...very slow response 120-300sec.
specs:

  • intel xeon 12 cores
  • 32GB RAM
  • SSD

I've changed Threads too but it's just saving a few seconds. Still generating word for word after seconds ;-)
Still great work!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants