Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor messages serialization #3832

Merged
merged 19 commits into from
Sep 18, 2024
Merged

Refactor messages serialization #3832

merged 19 commits into from
Sep 18, 2024

Conversation

enyst
Copy link
Collaborator

@enyst enyst commented Sep 11, 2024

CHANGELOG
Refactor the serialization of messages for vision, prompt caching, Groq incompatibilities.

  • fix potential bug with shared list
  • refactor the formatting we need to do into the serialization methods on the Message classes

Give a summary of what the PR does, explaining any non-trivial design decisions
Groq documents on their site that their API is openai-compatible, and lists only a few limitations that don't affect us. Sadly, that's not quite right: for vision format, their API gives 400 errors because it requires both system and assistant message to be simple strings (as opposed to accepting lists of text/image content). Cc: @tobitege I had a wild hope that it is only the system message, but the fact that assistant message is also not supported makes it impossible to merge implementations completely for now...

It's possible that the vision models are too new still, and Groq and/or liteLLM haven't got yet to adapt to all changes. I'll follow up on that... In the meantime we have a potential bug in the Message class and I took the opportunity to synchronize the two implementations (vision and non-vision) we have.

Tested with: Groq/Llama 3.1 70B, Gemini 1.5 Flash (AI Studio), o1, Sonnet 3.5

Part of #3812

@tobitege
Copy link
Collaborator

The format_message method is not without reason as big as it is. It took hours of tweaking and testing to make it work with the cases at hand. You also need to try a full integration regenerate test with this. 😬

@@ -78,6 +77,29 @@ def get_log_id(prompt_log_name):
return match.group(1)


def _format_messages(messages):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the thought process here of mocking this out compared to the previous usage that used the actual format_messages call?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 The reason why something like this is used in integration tests is that what they do is compare the prompt as created and sent to a mock LLM by the agent, with existing log files of the same thing - files that log prompts. So this is imitating the logging into a file...

Copy link
Collaborator

@li-boxuan li-boxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the context of this PR, so my comments are just about style/nitpicks

config.template.toml Show resolved Hide resolved
'role': role,
'content': content_str,
}
elif self.role == 'assistant' and not self.contains_image:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this branch seems mergeable with previous branch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I would suggest to keep it at least for now though. The branching helps my brain to see how we serialize each of the 3 roles, and I suspect we will continue to need to look / tweak this code...

@tobitege
Copy link
Collaborator

tobitege commented Sep 17, 2024

Before we merge this, I'd like this PR to get tested against Gemini, a Lama 3.1 model on Groq and of course Sonnet.
All if possible with vision on and off (where supported) and caching on and off (Sonnet).

I guess this could be a manual workflow file calling a single .py test file (in /tests folder) that we can improve upon and run manually from CI.
Just not sure how to deal with different llm configs in CLI (no toml, but coded?), if we can use the all-hands proxy for all models?

What do you guys think? @enyst @xingyaoww

@neubig
Copy link
Contributor

neubig commented Sep 17, 2024

Hey @tobitege , I understand the feeling, but also think that this might be a bit of a heavy lift to get this PR integrated. Maybe we could just make a best effort for this PR and put that on the list enhancements for the future.

Copy link
Collaborator

@tobitege tobitege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neubig
Copy link
Contributor

neubig commented Sep 18, 2024

@enyst : please feel free to merge if you think this is ready.

@enyst
Copy link
Collaborator Author

enyst commented Sep 18, 2024

To clarify a bit the issue here, for the record: since the introduction of vision support, we have changed the format we send messages to the LLM. The format we started sending was the openai-compatible format for vision, like:

{'content': [{'type': 'text', 'text': 'Ask me what your task is'}, ...], 'role': 'user'}

It has a list of dicts of content types, instead of the old simpler format, where 'content' is just a single string. Content types are 'text' and 'image_url'.

Things seemed to work, until they didn't: in reality, as @tobitege found and fixed already, multiple providers don't support this format or not fully. They appear to support this for role: user, but not for role: system, and sadly, not for role: assistant. The fix did what was necessary: since this situation requires us to support 2 kinds of serialization, basically, in some way, pending future fixes from litellm/providers. Tobi did the hard work on this and restored the non-vision format so that things work.

This PR merely aims to simplify the first take on this:

  • it settles on a compromise: serialize user messages in vision-like format, but the rest not, if vision isn't enabled. This works in all cases I've seen so far.
  • it refactors how we do serialization: it was in two places, now it's in one place, pydantic decorated Message serializers.

Note: I think we had some code in serialization that was used only in integration tests. This PR moves it to tests. IMHO the code has become way too complex, and we'll be happier if we keep the core code do core stuff, and the tests do tests stuff.

@enyst enyst merged commit 8fdfece into main Sep 18, 2024
13 checks passed
@enyst enyst deleted the enyst/messages branch September 18, 2024 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants