Replies: 1 comment 2 replies
-
are you using curl for llamacpp server? What configuration parameters are you sending to via curl? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What exactly I need to do to force llama server behave the same way as it works in llama cli or any implementation mode?
I'll explain it. Every model running with llama cpp works as expected when it's run from within any app like ollama or LM Studio, or even llama-cli. Yet, once I'm trying to run the model in the llama cpp server mode I stumble upon the same issue for months.
My idea is to use a model as a translator. I've been trying lots of them. Currently, I'm trying to work with Qwen 2.5 Q4.
If I ask the model to (literally): "Translate this text from Dutch to English" in -cnv (chat) mode, the result will always be an English output. Yet, once I'm attempting to do the same in production mode (in my case llama server), the model can accidentally write the same text in Dutch totally ignoring my instructions. The bug can happen and can not. But if the bug happens, it will be continuing every next run of the model; I mean with every API call.
I spent months with this issue. There is no flexible Python instruction, so I'm using the one presented in the official documentation (using it via openai)....
I'm totally disappointed. And i don't know what to do.
All I need is that model to behave absolutely the same way as it behave in conversation mode and that's it.
Beta Was this translation helpful? Give feedback.
All reactions