The following models have been tested and are known to work with llm
.
Models are distributed as single files, but come in a variety of quantization levels. You will need to select the quantization level that is appropriate for your application. For more information, see Getting Models in the README.
The LLaMA architecture is the most well-supported.
We have chosen not to include any models based on the original LLaMA model due to licensing concerns.
However, the OpenLLaMA models are available under the Apache 2.0 license and are compatible with llm
.
- https://huggingface.co/rustformers/open-llama-ggml
- https://huggingface.co/TheBloke/open-llama-13b-open-instruct-GGML
- https://huggingface.co/TheBloke/Flan-OpenLlama-7B-GGML
Models based on the original LLaMA model are also compatible, but you will need to find them yourselves due to their licensing.
- https://huggingface.co/lxe/Cerebras-GPT-2.7B-Alpaca-SP-ggml: note that this is
f16
-only and we recommend you quantize it usingllm
for best performance. - https://huggingface.co/TheBloke/WizardCoder-15B-1.0-GGML