Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Investigation] Format of completion calls issues, provider-specific features and refactoring #3812

Open
enyst opened this issue Sep 11, 2024 · 3 comments
Labels
backend Related to backend enhancement New feature or request severity:medium Affecting multiple users

Comments

@enyst
Copy link
Collaborator

enyst commented Sep 11, 2024

Summary
Investigate what compatibility tweaks for providers are necessary and why, isn't liteLLM doing this? If something's missing from liteLLM, maybe we can help them add it, and in the meantime comment clearly where our code is temporary.

Motivation
LLM, Message classes, the work with Messages in the agent, have become heavy and include quite a lot of fixes for issues with providers or special cases/features. These are sometimes extremely provider-specific:

  • prompt caching is only anthropic-based, and we can look to simplify / abstract that. Example: context caching for Gemini is very different than prompt caching for Anthropic, but liteLLM has already implemented a compatibility layer that makes it very similar in actual use. We could include it and take the opportunity to refactor the implementation a bit.
  • the prompt caching feature is relatively new, and similar to the vision feature from the point of view of necessary tweaks to format and compatibility between APIs, yet I haven't found a liteLLM "supports_caching" similar to "supports_vision". Maybe we can propose it to liteLLM, it would help us avoid hardcoding providers. Ref: [Feature]: supports_prompt_caching property for LMs BerriAI/litellm#5776
  • Groq API can give errors and in order to address them we have manually serialized messages to single strings, yet according to the published API it supports messages as arrays of content types, as performed by our pydantic serialization for user messages. So is the manual formatting necessary? The system message is a single string, so we could serialize that separately. System and assistant message are single strings.
  • Gemini works with system and assistant messages as strings, user messages as vision-like dicts. without vision is reported to have a similar error, needing system message as string, while its API for user messages seems compatible with pydantic serialization. Does anything need content types for the system role? If not, we can standardize on a system message as string instead.
    - Groq offers an OpenAI-compatible API endpoint, which suggests we can use the regular messages format (including vision-enabled) without provider-specific tweaks. (Are we using it? If not, why not?) however it's partial, and these incompatibilities are undocumented.
  • We have recently changed FE to add provider and models per provider. While this seems like an overdue standardization and makes the choice easier, are we missing anything: can users still specify "openai/" to have liteLLM re-route the calls to openai-compatible endpoints? Is it possible or desirable to add it ourselves to some providers or instead of some providers, to guide the choice of the correct endpoint?
  • related issue, hopefully to be solved in liteLLM

Technical Design

  • We rely on liteLLM to offer openai-compatible endpoints for all models and providers we work with. Clean up the code as much as possible by doing just that.
  • comment in big scary letters where stuff is temporary 😅

Alternatives to Consider
We can continue to tweak stuff ourselves, but it's becoming difficult to work with. Issues crop up due to the (warranted? unwarranted?) complexity.

@enyst
Copy link
Collaborator Author

enyst commented Sep 11, 2024

Also, I think we lack coverage for the code handling Messages. We have one test for a bit of the prompt caching feature.

@enyst enyst added enhancement New feature or request backend Related to backend severity:medium Affecting multiple users labels Sep 11, 2024
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale Inactive for 30 days label Oct 19, 2024
@enyst
Copy link
Collaborator Author

enyst commented Oct 19, 2024

I think there are two issues remaining here, both related to prompt caching:

  • prompt caching for gemini, unsolved
  • there's a temporary variable in llm.py for Anthropic models, which was fixing an issue with the UI model choice before liteLLM did.

@github-actions github-actions bot removed the Stale Inactive for 30 days label Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to backend enhancement New feature or request severity:medium Affecting multiple users
Projects
None yet
Development

No branches or pull requests

1 participant