Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HF version infer very slower than Original version?? How different between 2 version infer? #578

Open
KaidDuong opened this issue Oct 1, 2024 · 1 comment

Comments

@KaidDuong
Copy link

python benchmarks/benchmark_generation_mamba_simple.py --model-name "AntonV/mamba2-130m-hf" --batch 1 --genlen 4096 --promptlen 600 

Output:
Loading model AntonV/mamba2-130m-hf
Number of parameters: 128989632
Prompt length: 600, generation length: 4096
AntonV/mamba2-130m-hf prompt processing + decoding time: 132579ms

python benchmarks/benchmark_generation_mamba_simple.py --model-name "state-spaces/mamba2-130m" --batch 1 --genlen 4096 --promptlen 600 

Output:
Loading model state-spaces/mamba2-130m
Number of parameters: 128989632
Prompt length: 600, generation length: 4096
state-spaces/mamba2-130m prompt processing + decoding time: 6962ms

@tridao
Copy link
Collaborator

tridao commented Oct 1, 2024

Idk how the HF version is implemented. We recommend the version in this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants