-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for ggml #417
Comments
@philwee tagging the python bindings you shared which should make it much easier to add ggml support: |
If someone wants to work on this I’d be happy to give pointers! All that’s required is a new LM subclass akin to #395 . I may take a look at working on this integration on our end in ~1 month from now, if no one else has started a PR by then. |
I can try to work on this, could you give some pointers? |
Of course! I’d recommend looking at the PR I linked to get a sense of what the scope might be. The process would look something like:
Lmk if this makes sense! |
Carson Poole reports:
So it may be worth lowering the priority on this. Of course, implementing it would enable us to better evaluate these claims 🙃 |
there exists BLAS support (OpenBLAS, cuBLAS, clblast), which outperforms larger batchsizes of just the simd tuned code. (openblas -> cpu, cublas and clblast -> gpu) the blas acceleration can already make a difference with single digit batchsizese edit: also since only the logits are of interest, eval can be done in very large batchsizes (even better for blas) |
Personally I think this one is better (no need to call that one a "starting point"). |
I saw that, but per the issue at abetlen/llama-cpp-python#71 it appears to be 5x slower than the underlying implantation. |
It might be bc it does not build the llama.so/.dll properly / only in 1 configuration. so simd might be disabled. There is also the fact that there is no official BLAS enabled build available anywhere. (see abetlen/llama-cpp-python#117 ) |
but they are "easy" to fix after the fact, since you can build the llama.dll yourself with the buildoptions that you like and replace the one shipped with the bindings (recommended right now). |
@Green-Sky I have almost no experience with C, but if you can do that and demonstrate acceptable speed that works for me. |
@StellaAthena If you want to give me a representative test prompt I can compare llama-cpp-python to native Here's my (short run comparative) perplexity scores to date with the models I have on hand. |
llama-cpp-python attempts to implement the OpenAI API, so I may look at simply pointing the harness at an instance of |
Sounds great! |
Started adding support for a llama-cpp-python server here: #617 |
Courtesy of @matthoffner , lm-eval now supports GGML Llama models via |
Could there be support for ggml added to this soon - 4bit quantized models are said to be pretty decent, but there is no reliable way to test this out. It would be nice if support for it could be added to this.
Thank you!
The text was updated successfully, but these errors were encountered: