-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: self discover agent #3437
Conversation
Thanks for building and contributing this! 🤗 |
500fed6
to
a8d655d
Compare
|
||
SYSTEM_SUFFIX = """Let the following principles guide your response: | ||
- Before responding read all information carefully. | ||
- Take time to think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just commenting: I wonder whether this instruction was ever followed. 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good question. I don't have any data to answer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would love to see some test for this, if possible. Maybe in a follow up PR.
You MUST NOT include any other text besides the JSON response. | ||
""" | ||
|
||
# TODO: modify example so that it is consistent with prompting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep this commented out code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will remove the commented-out code for now and just leave the Todo comment
Can you showcase some simple examples? Screenshots would suffice. Then maybe we could run some evaluation against it. C.C. @xingyaoww |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! Overall this looks good to me! I don't see major road block to get this merged. A few things:
- We probably need to generalize the Jinja2 template (used in CodeAct now https://github.com/OpenDevin/OpenDevin/blob/14a4e45cbbbb01eba01b47e5d874c2583decdfd8/agenthub/codeact_agent/codeact_agent.py#L76-L80) at some point to better support different needs - this agent requires multiple different ways to compose prompt. I think we can merge this as is, and figure out ways to optimize later. (I think that's probably the downside of abstraction - you'd step on your own toe 😅)
- We should remove the comments.
Btw, @sb-git-cloud if you are interested in running some SWE-Bench eval to validate the agent, i'd be happy to sponsor some API credits - just lmk in the slack :D
@@ -195,6 +195,11 @@ def _get_messages(self, state: State) -> list[Message]: | |||
), | |||
] | |||
|
|||
if 'task' in state.inputs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this absolutely necessary for CodeAct to be delegated by other agents? cc @li-boxuan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, at least last I checked. The task was not accessible to the agent via an event.
I've been thinking we could return it from get_current_user_intent()
though. It would be easier to use like goal=state.get_user_intent()
for everything, delegate or not.
The issue is, if I recall correctly, that children start with zero history. The task is included in the DelegateAction, but the child doesn't have access to that action. We'll need Li Boxuan to confirm, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contract is messy and there lacks a clear API between agents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like for now the best is to leave it as is, and once the API between agents is defined we can update the agent with a follow-up PR.
implement_example = IMPLEMENT_EXAMPLE.format( | ||
implement_state_key=SelfDiscoverState.IMPLEMENT.value, | ||
task_key=TASK_KEY, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be nice if we can use the newly implemented Jinja template for this agent - but no pressure - we can do it follow up PR.
If the template is too rigid, we can figure out a way to generalize it a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, given your comment below let's defer to a future PR.
""" | ||
|
||
|
||
def get_prompt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought: the prompt here is probably a little bit too complex to support for the Jinja2 template in its current shape - we can defer that to future PR.
You MUST NOT include any other text besides the JSON response. | ||
""" | ||
|
||
# TODO: modify example so that it is consistent with prompting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR still needs to register the agent in the agenthub registry (agenthub/__init__.py
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, can you add some README.md under this folder to explain more context about self-discover agent? I also think we need some evalution result on SWE bench before merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little curious on this, what result are we looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds good. Thanks for the suggestion. I will add a readme and run some evaluations.
Thank you all for your reviews! Will work on the related changes. |
Hi @sb-git-cloud thanks so much for your contribution. Just wanted to check if this is something that is generally on your radar? |
Thanks for checking in. Yes. Haven't had the chance to run evaluations. Hopefully I will get to it next week. |
Hey @sb-git-cloud , if you're interested in running evaluations, things are now a bit easier with our remote runtime: https://github.com/All-Hands-AI/OpenHands/tree/main/evaluation/swe_bench#run-inference-on-remoteruntime-experimental If you'd like to use that we'd be happy to help out. |
This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
This PR was closed because it has been stalled for over 30 days with no activity. |
What is the problem that this fixes or functionality that this introduces? Does it fix any open issues?
Implements a Self Discover agent, which provides a step-by-step plan to CodeActAgent. This is related to the open PR #46.
Give a summary of what the PR does, explaining any non-trivial design decisions
The agent first chooses reasoning modules that may be relevant for the user-given task. Then it adapts those to the specific task and finally it creates a plan, which is then sent to CodeAct for execution.
As of now it is in beta version.
Other references