Llama3 8B IQ 1 M takes 22 hours for perplexity testing on wiki text 2 #9486
-
I have compiled llama.cpp with GGML_USE_CUDA. I am running Meta-Llama-3-8B-Instruct-IQ1_M.gguf via llama-perplexity. The ETA shows 22 hours (on intel CPU) on wiki-text-2. Meanwhile, when I am running Meta-Llama-3-8B-Instruct-IQ4_NL.gguf, the ETA shows 6 hours. Any idea why? |
Beta Was this translation helpful? Give feedback.
Answered by
joseph777111
Sep 16, 2024
Replies: 1 comment 1 reply
-
https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md#cuda GGML_USE_CUDA is not a flag. Use Make
or Use CMake
|
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
Abhranta
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md#cuda
GGML_USE_CUDA is not a flag.
Use Make
or
Use CMake