Ollama: adding support for num_ctx parameter based on max_input_tokens #806

peat · 2024-08-27T19:45:35Z

It does what it says on the tin.

Ollama uses the num_ctx option to specify the maximum context length for a request. If absent, ollama (v 0.3.6) defaults to 2048 tokens, regardless of what an aichat user specifies as max_input_tokens in their local configuration.

…s configuration

peat · 2024-08-27T23:42:46Z

In this case, both require_max_tokens and max_output_tokens are required in the config file. Would you like me to update this for the config generator? It skips require_max_tokens so it's disabled out of the box.

Separate (but related) -- the num_predict option is determined by the same value, although it has a different meaning. num_predict tells Ollama the limit on how many tokens it can generate.

I recommend that if max_tokens_param returns Some, we should:

Set num_ctx to the value
Set num_predict to -1 which tells Ollama that it can use the remaining token space for its output.

Alternately, we can use max_output_tokens as the num_predict value, since I think that's more closely aligned with the intended use.

Let me know your thoughts and I'll do the bit fiddling. Thanks!

sigoden · 2024-08-27T23:59:33Z

My apologies. max_tokens_param is for max_output_tokens, not max_input_tokens.

Ollama is already aware of the max_tokens_param.

aichat/src/client/ollama.rs

Lines 236 to 238 in 11022f8

    
           if let Some(v) = model.max_tokens_param() { 
        
               body["options"]["num_predict"] = v.into(); 
        
           }

The parameter num_ctx should derive from max_input_tokens, please revert your latest commit.

sigoden · 2024-08-28T00:08:47Z

Usually, we don't need to pass max_input_tokens to the API server. Ollama is the only client that may use max_input_tokens, and we haven't figured out yet whether to derive num_ctx directly from max_input_tokens.

The currently available solution is using patch.

    patch:
      chat_completions:
        'llama3.1':
          body:
            options:
              num_ctx: 131072

peat · 2024-08-28T00:19:15Z

Ok, cool. I can use patch in the meantime, while you're sorting out the details. There are quite a few other Ollama specific parameters, and I'm happy to help if you decide to expand those capabilities. Cheers!

sigoden · 2024-08-28T00:39:21Z

ollama/ollama#6504 (comment) The ollama author says that num_ctx will be set automatically based on available VRAM and compute.

Also, we've decided to deprecate the ollama client in the future and instead use an openai-compatible client to access the ollama API.

So we decided not to derive num_ctx directly from max_input_tokens, this will disrupt the current tacit understanding and bring about a break chang on the future.

Suggest setting num_ctx using patch.

Thank you for your contribution.

peat · 2024-08-28T00:41:36Z

Oh snap, that's great. Thank you!

AllyourBaseBelongToUs · 2024-09-06T15:51:42Z

ollama/ollama#6504 (comment) The ollama author says that num_ctx will be set automatically based on available VRAM and compute.

Also, we've decided to deprecate the ollama client in the future and instead use an openai-compatible client to access the ollama API.

So we decided not to derive num_ctx directly from max_input_tokens, this will disrupt the current tacit understanding and bring about a break chang on the future.

Suggest setting num_ctx using patch.

Thank you for your contribution.

would be nice if you let us have a config file somewhere for server settings, without fiddling with environment variables

right now the server overrides it stil to 2k context window unless the modelfile is specified with the right parameters (even if you ollama run MODEL and then /set parameters)

especially painful when using a close source program which uses the ollama localhost, but has its own stupid max token settings etc

Ollama: adding support for num_ctx parameter based on max_input_token…

03eb935

…s configuration

peat force-pushed the main branch from a67dd4a to 03eb935 Compare August 28, 2024 00:03

peat closed this Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama: adding support for num_ctx parameter based on max_input_tokens #806

Ollama: adding support for num_ctx parameter based on max_input_tokens #806

peat commented Aug 27, 2024

peat commented Aug 27, 2024

sigoden commented Aug 27, 2024

sigoden commented Aug 28, 2024 •

edited

Loading

peat commented Aug 28, 2024

sigoden commented Aug 28, 2024

peat commented Aug 28, 2024

AllyourBaseBelongToUs commented Sep 6, 2024

Ollama: adding support for num_ctx parameter based on max_input_tokens #806

Ollama: adding support for num_ctx parameter based on max_input_tokens #806

Conversation

peat commented Aug 27, 2024

peat commented Aug 27, 2024

sigoden commented Aug 27, 2024

sigoden commented Aug 28, 2024 • edited Loading

peat commented Aug 28, 2024

sigoden commented Aug 28, 2024

peat commented Aug 28, 2024

AllyourBaseBelongToUs commented Sep 6, 2024

sigoden commented Aug 28, 2024 •

edited

Loading