-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor messages serialization #3832
Conversation
The format_message method is not without reason as big as it is. It took hours of tweaking and testing to make it work with the cases at hand. You also need to try a full integration regenerate test with this. 😬 |
f090984
to
0979650
Compare
242f55d
to
4724ec7
Compare
@@ -78,6 +77,29 @@ def get_log_id(prompt_log_name): | |||
return match.group(1) | |||
|
|||
|
|||
def _format_messages(messages): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the thought process here of mocking this out compared to the previous usage that used the actual format_messages call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 The reason why something like this is used in integration tests is that what they do is compare the prompt as created and sent to a mock LLM by the agent, with existing log files of the same thing - files that log prompts. So this is imitating the logging into a file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the context of this PR, so my comments are just about style/nitpicks
'role': role, | ||
'content': content_str, | ||
} | ||
elif self.role == 'assistant' and not self.contains_image: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this branch seems mergeable with previous branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I would suggest to keep it at least for now though. The branching helps my brain to see how we serialize each of the 3 roles, and I suspect we will continue to need to look / tweak this code...
Before we merge this, I'd like this PR to get tested against Gemini, a Lama 3.1 model on Groq and of course Sonnet. I guess this could be a manual workflow file calling a single .py test file (in /tests folder) that we can improve upon and run manually from CI. What do you guys think? @enyst @xingyaoww |
Hey @tobitege , I understand the feeling, but also think that this might be a bit of a heavy lift to get this PR integrated. Maybe we could just make a best effort for this PR and put that on the list enhancements for the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@enyst : please feel free to merge if you think this is ready. |
To clarify a bit the issue here, for the record: since the introduction of vision support, we have changed the format we send messages to the LLM. The format we started sending was the openai-compatible format for vision, like:
It has a list of dicts of content types, instead of the old simpler format, where 'content' is just a single string. Content types are 'text' and 'image_url'. Things seemed to work, until they didn't: in reality, as @tobitege found and fixed already, multiple providers don't support this format or not fully. They appear to support this for This PR merely aims to simplify the first take on this:
Note: I think we had some code in serialization that was used only in integration tests. This PR moves it to tests. IMHO the code has become way too complex, and we'll be happier if we keep the core code do core stuff, and the tests do tests stuff. |
CHANGELOG
Refactor the serialization of messages for vision, prompt caching, Groq incompatibilities.
Give a summary of what the PR does, explaining any non-trivial design decisions
Groq documents on their site that their API is openai-compatible, and lists only a few limitations that don't affect us. Sadly, that's not quite right: for vision format, their API gives 400 errors because it requires both
system
andassistant
message to be simple strings (as opposed to accepting lists of text/image content). Cc: @tobitege I had a wild hope that it is only the system message, but the fact that assistant message is also not supported makes it impossible to merge implementations completely for now...It's possible that the vision models are too new still, and Groq and/or liteLLM haven't got yet to adapt to all changes. I'll follow up on that... In the meantime we have a potential bug in the Message class and I took the opportunity to synchronize the two implementations (vision and non-vision) we have.
Tested with: Groq/Llama 3.1 70B, Gemini 1.5 Flash (AI Studio), o1, Sonnet 3.5
Part of #3812