Why any model running with llama server behave differently? #9660

alexcardo · 2024-09-26T17:05:43Z

alexcardo
Sep 26, 2024

What exactly I need to do to force llama server behave the same way as it works in llama cli or any implementation mode?

I'll explain it. Every model running with llama cpp works as expected when it's run from within any app like ollama or LM Studio, or even llama-cli. Yet, once I'm trying to run the model in the llama cpp server mode I stumble upon the same issue for months.

My idea is to use a model as a translator. I've been trying lots of them. Currently, I'm trying to work with Qwen 2.5 Q4.

If I ask the model to (literally): "Translate this text from Dutch to English" in -cnv (chat) mode, the result will always be an English output. Yet, once I'm attempting to do the same in production mode (in my case llama server), the model can accidentally write the same text in Dutch totally ignoring my instructions. The bug can happen and can not. But if the bug happens, it will be continuing every next run of the model; I mean with every API call.

I spent months with this issue. There is no flexible Python instruction, so I'm using the one presented in the official documentation (using it via openai)....

I'm totally disappointed. And i don't know what to do.

All I need is that model to behave absolutely the same way as it behave in conversation mode and that's it.

mirek190 · 2024-09-26T18:45:06Z

mirek190
Sep 26, 2024

are you using curl for llamacpp server? What configuration parameters are you sending to via curl?

2 replies

alexcardo Sep 26, 2024
Author

No, I don't use CURL as I need a Python implementation. As mentioned above, I use the approach provided in the official instruction... In this particular example, I'm trying to deal with the LM Studio server (which is based on llama cpp), but I experience the same behavior with the bare llama server.

url = "http://127.0.0.1:8080/v1/chat/completions"

headers = {
    "Content-Type": "application/json"
}
data = {
    "model": "lmstudio-community/Qwen2.5-7B-Instruct-GGUF/Qwen2.5-7B-Instruct-Q4_K_M.gguf",
    "messages": [
        {"role": "system", "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."},
        {"role": "user", "content": f'''Translate this text from Dutch to English. Keep markdown: {markdown_output}'''}
    ],
    "temperature": 0,
    "max_tokens": -1,
    "stream": False
}

As mentioned here:

Examples:

You can use either Python openai library with appropriate checkpoints:

import openai

client = openai.OpenAI(
    base_url="http://localhost:8080/v1", # "http://<Your api-server IP>:port"
    api_key = "sk-no-key-required"
)

completion = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
    {"role": "system", "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."},
    {"role": "user", "content": "Write a limerick about python exceptions"}
]
)

print(completion.choices[0].message)

alexcardo Sep 26, 2024
Author

You can take this code presented in the official documentation, take the same model with the same quant, send 10 articles to translate to the llama cpp server API, and you'll get 4 of them translated from Dutch to English, while the rest 6 will remain in Dutch.

Meanwhile, if you feed them all in the -cnv (chat) mode, all of them will be translated correctly.

I've been trying LLAMA 3.1-2, Gemma, QWEN, OLMoE, etc... All of them behave the same way with the llama cpp server.

Perhaps I need to use a prompt template somehow in the API request...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why any model running with llama server behave differently? #9660

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Why any model running with llama server behave differently? #9660

alexcardo Sep 26, 2024

Replies: 1 comment · 2 replies

mirek190 Sep 26, 2024

alexcardo Sep 26, 2024 Author

alexcardo Sep 26, 2024 Author

alexcardo
Sep 26, 2024

Replies: 1 comment 2 replies

mirek190
Sep 26, 2024

alexcardo Sep 26, 2024
Author

alexcardo Sep 26, 2024
Author