[Investigation] Format of completion calls issues, provider-specific features and refactoring #3812
Labels
backend
Related to backend
enhancement
New feature or request
severity:medium
Affecting multiple users
Summary
Investigate what compatibility tweaks for providers are necessary and why, isn't liteLLM doing this? If something's missing from liteLLM, maybe we can help them add it, and in the meantime comment clearly where our code is temporary.
Motivation
LLM, Message classes, the work with Messages in the agent, have become heavy and include quite a lot of fixes for issues with providers or special cases/features. These are sometimes extremely provider-specific:
supports_prompt_caching
property for LMs BerriAI/litellm#5776user
messages.So is the manual formatting necessary? The system message is a single string, so we could serialize that separately.System
andassistant
message are single strings.without vision is reported to have a similar error, needing system message as string, while its API for user messages seems compatible with pydantic serialization. Does anything need content types for the system role? If not, we can standardize on a system message as string instead.- Groq offers an OpenAI-compatible API endpoint,which suggests we can use the regular messages format (including vision-enabled) without provider-specific tweaks. (Are we using it? If not, why not?)however it's partial, and these incompatibilities are undocumented.While this seems like an overdue standardization and makes the choice easier, are we missing anything: can users still specify "openai/" to have liteLLM re-route the calls to openai-compatible endpoints?Is it possible or desirable to add it ourselves to some providers or instead of some providers, to guide the choice of the correct endpoint?Technical Design
Alternatives to Consider
We can continue to tweak stuff ourselves, but it's becoming difficult to work with. Issues crop up due to the (warranted? unwarranted?) complexity.
The text was updated successfully, but these errors were encountered: