[Investigation] Format of completion calls issues, provider-specific features and refactoring #3812

enyst · 2024-09-11T00:28:50Z

Summary
Investigate what compatibility tweaks for providers are necessary and why, isn't liteLLM doing this? If something's missing from liteLLM, maybe we can help them add it, and in the meantime comment clearly where our code is temporary.

Motivation
LLM, Message classes, the work with Messages in the agent, have become heavy and include quite a lot of fixes for issues with providers or special cases/features. These are sometimes extremely provider-specific:

prompt caching is only anthropic-based, and we can look to simplify / abstract that. Example: context caching for Gemini is very different than prompt caching for Anthropic, but liteLLM has already implemented a compatibility layer that makes it very similar in actual use. We could include it and take the opportunity to refactor the implementation a bit.
the prompt caching feature is relatively new, and similar to the vision feature from the point of view of necessary tweaks to format and compatibility between APIs, yet I haven't found a liteLLM "supports_caching" similar to "supports_vision". Maybe we can propose it to liteLLM, it would help us avoid hardcoding providers. Ref: [Feature]: supports_prompt_caching property for LMs BerriAI/litellm#5776
Groq API can give errors and in order to address them we have manually serialized messages to single strings, yet according to the published API it supports messages as arrays of content types, as performed by our pydantic serialization for user messages. ~~So is the manual formatting necessary? The system message is a single string, so we could serialize that separately.~~ System and assistant message are single strings.
Gemini works with system and assistant messages as strings, user messages as vision-like dicts. without vision is reported to have a similar error, needing system message as string, while its API for user messages seems compatible with pydantic serialization. Does anything need content types for the system role? If not, we can standardize on a system message as string instead.
- Groq offers an OpenAI-compatible API endpoint, ~~which suggests we can use the regular messages format (including vision-enabled) without provider-specific tweaks. (Are we using it? If not, why not?)~~ however it's partial, and these incompatibilities are undocumented.
We have recently changed FE to add provider and models per provider. While this seems like an overdue standardization and makes the choice easier, are we missing anything: can users still specify "openai/" to have liteLLM re-route the calls to openai-compatible endpoints? Is it possible or desirable to add it ourselves to some providers or instead of some providers, to guide the choice of the correct endpoint?
related issue, hopefully to be solved in liteLLM

Technical Design

We rely on liteLLM to offer openai-compatible endpoints for all models and providers we work with. Clean up the code as much as possible by doing just that.
comment in big scary letters where stuff is temporary 😅

Alternatives to Consider
We can continue to tweak stuff ourselves, but it's becoming difficult to work with. Issues crop up due to the (warranted? unwarranted?) complexity.

The text was updated successfully, but these errors were encountered:

enyst · 2024-09-11T00:40:09Z

Also, I think we lack coverage for the code handling Messages. We have one test for a bit of the prompt caching feature.

github-actions · 2024-10-19T01:58:13Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

enyst · 2024-10-19T02:43:31Z

I think there are two issues remaining here, both related to prompt caching:

prompt caching for gemini, unsolved
there's a temporary variable in llm.py for Anthropic models, which was fixing an issue with the UI model choice before liteLLM did.

enyst added enhancement New feature or request backend Related to backend severity:medium Affecting multiple users labels Sep 11, 2024

enyst mentioned this issue Sep 11, 2024

Refactor messages serialization #3832

Merged

github-actions bot added the Stale Inactive for 30 days label Oct 19, 2024

github-actions bot removed the Stale Inactive for 30 days label Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Investigation] Format of completion calls issues, provider-specific features and refactoring #3812

[Investigation] Format of completion calls issues, provider-specific features and refactoring #3812

enyst commented Sep 11, 2024 •

edited

Loading

enyst commented Sep 11, 2024

github-actions bot commented Oct 19, 2024

enyst commented Oct 19, 2024

[Investigation] Format of completion calls issues, provider-specific features and refactoring #3812

[Investigation] Format of completion calls issues, provider-specific features and refactoring #3812

Comments

enyst commented Sep 11, 2024 • edited Loading

enyst commented Sep 11, 2024

github-actions bot commented Oct 19, 2024

enyst commented Oct 19, 2024

enyst commented Sep 11, 2024 •

edited

Loading