Llama3 8B IQ 1 M takes 22 hours for perplexity testing on wiki text 2 #9486

Abhranta · 2024-09-14T19:42:35Z

Abhranta
Sep 14, 2024

I have compiled llama.cpp with GGML_USE_CUDA. I am running Meta-Llama-3-8B-Instruct-IQ1_M.gguf via llama-perplexity. The ETA shows 22 hours (on intel CPU) on wiki-text-2. Meanwhile, when I am running Meta-Llama-3-8B-Instruct-IQ4_NL.gguf, the ETA shows 6 hours. Any idea why?

Answered by joseph777111

Sep 16, 2024

https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md#cuda

GGML_USE_CUDA is not a flag.

Use Make

make GGML_CUDA=1

or

Use CMake

cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

View full answer

joseph777111 · 2024-09-16T05:00:45Z

joseph777111
Sep 16, 2024

https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md#cuda

GGML_USE_CUDA is not a flag.

Use Make

make GGML_CUDA=1

or

Use CMake

cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

1 reply

Abhranta Sep 17, 2024
Author

Got it ! Thanks !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama3 8B IQ 1 M takes 22 hours for perplexity testing on wiki text 2 #9486

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Llama3 8B IQ 1 M takes 22 hours for perplexity testing on wiki text 2 #9486

Abhranta Sep 14, 2024

Replies: 1 comment · 1 reply

joseph777111 Sep 16, 2024

Abhranta Sep 17, 2024 Author

Abhranta
Sep 14, 2024

Replies: 1 comment 1 reply

joseph777111
Sep 16, 2024

Abhranta Sep 17, 2024
Author