Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latency [BUG] #21

Open
Akash08naik opened this issue Jan 3, 2024 · 3 comments
Open

latency [BUG] #21

Akash08naik opened this issue Jan 3, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Akash08naik
Copy link

The latency i am getting here and the actual time when i am inferencing are not same. And also there is a huge difference between these two. So could be the problem?

@Akash08naik Akash08naik added the bug Something isn't working label Jan 3, 2024
@cli99
Copy link
Owner

cli99 commented Jan 3, 2024

Can you share the way you run the tool and the actual time you saw in your benchmarking?

@Akash08naik
Copy link
Author

I used the tool by a running slurm job . Whereas the actual time I observed was loading the model and timing it using time module when given a prompt till decoding it . And all this is done on a cpu not gpu.

@Akash08naik
Copy link
Author

what happened the tool inference time is not matching with the inference i am taking in real time. I have ran it on v100 gpu card of 16 gb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants