Cannot run quantized model, but original model works well #2368

htranxp · 2024-08-20T09:28:11Z

I got this error:
ggml/src/ggml-cuda/template-instances/../mmq.cuh:2589: ERROR: CUDA kernel mul_mat_q has no device code compatible with CUDA arch 520. ggml-cuda.cu was compiled for: 520
when running inference on the large-v2-q5_0 model, but like the title, large-v2 works well.
Device: Tesla T4

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot run quantized model, but original model works well #2368

Cannot run quantized model, but original model works well #2368

htranxp commented Aug 20, 2024

Cannot run quantized model, but original model works well #2368

Cannot run quantized model, but original model works well #2368

Comments

htranxp commented Aug 20, 2024