Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run quantized model, but original model works well #2368

Open
htranxp opened this issue Aug 20, 2024 · 0 comments
Open

Cannot run quantized model, but original model works well #2368

htranxp opened this issue Aug 20, 2024 · 0 comments

Comments

@htranxp
Copy link

htranxp commented Aug 20, 2024

I got this error:
ggml/src/ggml-cuda/template-instances/../mmq.cuh:2589: ERROR: CUDA kernel mul_mat_q has no device code compatible with CUDA arch 520. ggml-cuda.cu was compiled for: 520
when running inference on the large-v2-q5_0 model, but like the title, large-v2 works well.
Device: Tesla T4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant