Model llama-2-7b.Q4_0.gguf
Loads with llama.cpp
but Fails with whisper.cpp
#1316
Replies: 2 comments
-
Further Investigation on Model Compatibility: I've done some additional testing to narrow down the problem. I converted a llama model from Hugging Face, Meta repo, using the following command: Pleasingly, the converted model ( |
Beta Was this translation helpful? Give feedback.
-
The issue seems to be with the newer quantization models, a Q8 gguf model works fine. |
Beta Was this translation helpful? Give feedback.
-
Description:
Hello! When I try to run the model
llama-2-7b.Q4_0.gguf
(TheBloke repo) usingllama.cpp
, everything works fine. However, when I attempt to use the same model withwhisper.cpp
talk-llama
, I encounter an error. Additionally, I'd like to mention that executing./main -m models/ggml-small.en.bin -f samples/jfk.wav
works correctly without any issues.Steps to Reproduce:
Load the
llama-2-7b.Q4_0.gguf
model usingllama.cpp
(Works without issues).Attempt to use the above model with
whisper.cpp
talk-llama
using the following command:./talk-llama -mw ./models/ggml-small.en.bin -ml ../llama.cpp/models/llama-2-7b.Q4_0.gguf -p "Hey, there" -t 4
Expected Behavior:
The model should load and work without any issues, just as it does with
llama.cpp
.Actual Behavior:
An error message is displayed, stating:
This is followed by a segmentation fault.
Additional Information:
Device: Apple M2
Model file: llama-2-7b.Q4_0.gguf
Whisper model file: ./models/ggml-small.en.bin
I Would appreciate any guidance or insights into why this might be happening and how to resolve it. Thanks for your time!
Full Error Message:
Beta Was this translation helpful? Give feedback.
All reactions