Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

langgraph: add message list validation to create_react_agent + a troubleshooting guide #2182

Merged
merged 6 commits into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/docs/how-tos/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---

Check notice on line 1 in docs/docs/how-tos/index.md

View workflow job for this annotation

GitHub Actions / benchmark

Benchmark results

......................................... fanout_to_subgraph_10x: Mean +- std dev: 48.2 ms +- 0.8 ms ......................................... fanout_to_subgraph_10x_sync: Mean +- std dev: 43.8 ms +- 0.5 ms ......................................... fanout_to_subgraph_10x_checkpoint: Mean +- std dev: 77.0 ms +- 1.6 ms ......................................... fanout_to_subgraph_10x_checkpoint_sync: Mean +- std dev: 84.7 ms +- 0.9 ms ......................................... fanout_to_subgraph_100x: Mean +- std dev: 470 ms +- 10 ms ......................................... fanout_to_subgraph_100x_sync: Mean +- std dev: 427 ms +- 5 ms ......................................... fanout_to_subgraph_100x_checkpoint: Mean +- std dev: 791 ms +- 47 ms ......................................... fanout_to_subgraph_100x_checkpoint_sync: Mean +- std dev: 837 ms +- 18 ms ......................................... WARNING: the benchmark result may be unstable * the standard deviation (3.79 ms) is 12% of the mean (31.9 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. react_agent_10x: Mean +- std dev: 31.9 ms +- 3.8 ms ......................................... react_agent_10x_sync: Mean +- std dev: 22.8 ms +- 1.7 ms ......................................... react_agent_10x_checkpoint: Mean +- std dev: 47.3 ms +- 3.2 ms ......................................... react_agent_10x_checkpoint_sync: Mean +- std dev: 36.9 ms +- 2.7 ms ......................................... react_agent_100x: Mean +- std dev: 326 ms +- 6 ms ......................................... react_agent_100x_sync: Mean +- std dev: 262 ms +- 3 ms ......................................... react_agent_100x_checkpoint: Mean +- std dev: 906 ms +- 8 ms ......................................... react_agent_100x_checkpoint_sync: Mean +- std dev: 816 ms +- 8 ms ......................................... wide_state_25x300: Mean +- std dev: 18.5 ms +- 0.4 ms ......................................... wide_state_25x300_sync: Mean +- std dev: 10.8 ms +- 0.1 ms ......................................... wide_state_25x300_checkpoint: Mean +- std dev: 278 ms +- 12 ms ......................................... wide_state_25x300_checkpoint_sync: Mean +- std dev: 268 ms +- 13 ms ......................................... wide_state_15x600: Mean +- std dev: 21.5 ms +- 0.5 ms ......................................... wide_state_15x600_sync: Mean +- std dev: 12.5 ms +- 0.1 ms ......................................... wide_state_15x600_checkpoint: Mean +- std dev: 477 ms +- 12 ms ......................................... wide_state_15x600_checkpoint_sync: Mean +- std dev: 464 ms +- 14 ms ......................................... wide_state_9x1200: Mean +- std dev: 21.4 ms +- 0.4 ms ......................................... wide_state_9x1200_sync: Mean +- std dev: 12.5 ms +- 0.1 ms ......................................... wide_state_9x1200_checkpoint: Mean +- std dev: 312 ms +- 12 ms ......................................... wide_state_9x1200_checkpoint_sync: Mean +- std dev: 300 ms +- 12 ms

Check notice on line 1 in docs/docs/how-tos/index.md

View workflow job for this annotation

GitHub Actions / benchmark

Comparison against main

+-----------------------------------------+---------+-----------------------+ | Benchmark | main | changes | +=========================================+=========+=======================+ | wide_state_25x300_sync | 10.8 ms | 10.8 ms: 1.00x slower | +-----------------------------------------+---------+-----------------------+ | fanout_to_subgraph_10x | 48.0 ms | 48.2 ms: 1.00x slower | +-----------------------------------------+---------+-----------------------+ | fanout_to_subgraph_10x_sync | 43.6 ms | 43.8 ms: 1.01x slower | +-----------------------------------------+---------+-----------------------+ | wide_state_15x600_sync | 12.5 ms | 12.5 ms: 1.01x slower | +-----------------------------------------+---------+-----------------------+ | wide_state_15x600 | 21.3 ms | 21.5 ms: 1.01x slower | +-----------------------------------------+---------+-----------------------+ | fanout_to_subgraph_100x_sync | 423 ms | 427 ms: 1.01x slower | +-----------------------------------------+---------+-----------------------+ | wide_state_25x300 | 18.3 ms | 18.5 ms: 1.01x slower | +-----------------------------------------+---------+-----------------------+ | react_agent_100x_checkpoint | 898 ms | 906 ms: 1.01x slower | +-----------------------------------------+---------+-----------------------+ | fanout_to_subgraph_10x_checkpoint_sync | 83.8 ms | 84.7 ms: 1.01x slower | +-----------------------------------------+---------+-----------------------+ | wide_state_9x1200_sync | 12.4 ms | 12.5 ms: 1.01x slower | +-----------------------------------------+---------+-----------------------+ | fanout_to_subgraph_100x_checkpoint_sync | 827 ms | 837 ms: 1.01x slower | +-----------------------------------------+---------+-----------------------+ | fanout_to_subgraph_100x | 463 ms | 470 ms: 1.02x slower | +-----------------------------------------+---------+-----------------------+ | react_agent_100x_checkpoint_sync | 803 ms | 816 ms: 1.02x slower | +-----------------------------------------+---------+-----------------------+ | react_agent_10x_checkpoint | 46.4 ms | 47.3 ms: 1.02x slower | +-----------------------------------------+---------+-----------------------+ | react_agent_100x | 317 ms | 326 ms: 1.03x slower | +-----------------------------------------+---------+-----------------------+ | react_agent_10x_sync | 22.2 ms | 22.8 ms: 1.03x slower | +-----------------------------------------+---------+-----------------------+ | react_agent_100x_sync | 252 ms | 262 ms: 1.04x slower | +-----------------------------------------+---------+-----------------------+ | react_agent_10x | 30.7 ms | 31.9 ms: 1.04x slower | +-----------------------------------------+---------+-----------------------+ | Geometric mean | (ref) | 1.01x slower | +-----------------------------------------+---------+-----------------------+ Benchmark hidden because not significant (10): wide_state_9x1200, wide_state_15x600_checkpoint, wide_state_25x300_checkpoint, wide_state_25x300_checkpoint_sync, wide_state_15x600_checkpoint_sync, wide_state_9x1200_checkpoint_sync, wide_state_9x1200_checkpoint, fanout_to_subgraph_10x_checkpoint, fanout_to_subgraph_100x_checkpoint, react_agent_10x_checkpoint_sync
hide:
- navigation
title: How-to Guides
Expand Down Expand Up @@ -220,11 +220,12 @@

## Troubleshooting

The [Error Reference](../troubleshooting/errors/index.md) page contains guides around resolving common errors you may find while building with LangGraph. Errors referenced below will have an `lc_error_code` property corresponding to one of the below codes when they are thrown in code.
These are the guides for resolving common errors you may find while building with LangGraph. Errors referenced below will have an `lc_error_code` property corresponding to one of the below codes when they are thrown in code.

- [GRAPH_RECURSION_LIMIT](../troubleshooting/errors/GRAPH_RECURSION_LIMIT.md)
- [INVALID_CONCURRENT_GRAPH_UPDATE](../troubleshooting/errors/INVALID_CONCURRENT_GRAPH_UPDATE.md)
- [INVALID_GRAPH_NODE_RETURN_VALUE](../troubleshooting/errors/INVALID_GRAPH_NODE_RETURN_VALUE.md)
- [MULTIPLE_SUBGRAPHS](../troubleshooting/errors/MULTIPLE_SUBGRAPHS.md)
- [INVALID_CHAT_HISTORY](../troubleshooting/errors/INVALID_CHAT_HISTORY.md)


30 changes: 30 additions & 0 deletions docs/docs/troubleshooting/errors/INVALID_CHAT_HISTORY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# INVALID_CHAT_HISTORY

This error is raised in the prebuilt [create_react_agent][langgraph.prebuilt.chat_agent_executor.create_react_agent] when the `call_model` graph node receives a malformed list of messages. Specifically, it is malformed when there are `AIMessages` with `tool_calls` (LLM requesting to call a tool) that do not have a corresponding `ToolMessage` (result of a tool invocation to return to the LLM).

There could be a few reasons you're seeing this error:

1. You manually passed a malformed list of messages when invoking the graph, e.g. `graph.invoke({'messages': [AIMessage(..., tool_calls=[...])]})`
2. The graph was interrupted before receiving updates from the `tools` node (i.e. a list of ToolMessages)
and you invoked it with a an input that is not None or a ToolMessage,
e.g. `graph.invoke({'messages': [HumanMessage(...)]}, config)`.
This interrupt could have been triggered in one of the following ways:
- You manually set `interrupt_before = ['tools']` in `create_react_agent`
- One of the tools raised an error that wasn't handled by the [ToolNode][langgraph.prebuilt.tool_node.ToolNode] (`"tools"`)

## Troubleshooting

To resolve this, you can do one of the following:

1. Don't invoke the graph with a malformed list of messages
2. In case of an interrupt (manual or due to an error) you can:

- provide ToolMessages that match existing tool calls and call `graph.invoke({'messages': [ToolMessage(...)]})`.
**NOTE**: this will append the messages to the history and run the graph from the START node.
- manually update the state and resume the graph from the interrupt:

1. get the list of most recent messages from the graph state with `graph.get_state(config)`
2. modify the list of messages to either remove unanswered tool calls from AIMessages
or add ToolMessages with tool_call_ids that match unanswered tool calls
3. call `graph.update_state(config, {'messages': ...})` with the modified list of messages
4. resume the graph, e.g. call `graph.invoke(None, config)`
1 change: 1 addition & 0 deletions docs/docs/troubleshooting/errors/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ Errors referenced below will have an `lc_error_code` property corresponding to o
- [INVALID_CONCURRENT_GRAPH_UPDATE](./INVALID_CONCURRENT_GRAPH_UPDATE.md)
- [INVALID_GRAPH_NODE_RETURN_VALUE](./INVALID_GRAPH_NODE_RETURN_VALUE.md)
- [MULTIPLE_SUBGRAPHS](./MULTIPLE_SUBGRAPHS.md)
- [INVALID_CHAT_HISTORY](./INVALID_CHAT_HISTORY.md)
1 change: 1 addition & 0 deletions libs/langgraph/langgraph/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ class ErrorCode(Enum):
INVALID_CONCURRENT_GRAPH_UPDATE = "INVALID_CONCURRENT_GRAPH_UPDATE"
INVALID_GRAPH_NODE_RETURN_VALUE = "INVALID_GRAPH_NODE_RETURN_VALUE"
MULTIPLE_SUBGRAPHS = "MULTIPLE_SUBGRAPHS"
INVALID_CHAT_HISTORY = "INVALID_CHAT_HISTORY"


def create_error_message(*, message: str, error_code: ErrorCode) -> str:
Expand Down
34 changes: 34 additions & 0 deletions libs/langgraph/langgraph/prebuilt/chat_agent_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from typing_extensions import Annotated, TypedDict

from langgraph._api.deprecation import deprecated_parameter
from langgraph.errors import ErrorCode, create_error_message
from langgraph.graph import StateGraph
from langgraph.graph.graph import CompiledGraph
from langgraph.graph.message import add_messages
Expand Down Expand Up @@ -161,6 +162,37 @@ def _should_bind_tools(model: LanguageModelLike, tools: Sequence[BaseTool]) -> b
return False


def _validate_chat_history(
messages: Sequence[BaseMessage],
vbarda marked this conversation as resolved.
Show resolved Hide resolved
) -> None:
"""Validate that all tool calls in AIMessages have a corresponding ToolMessage."""
all_tool_calls = [
tool_call
for message in messages
if isinstance(message, AIMessage)
for tool_call in message.tool_calls
]
tool_call_ids_with_results = {
message.tool_call_id for message in messages if isinstance(message, ToolMessage)
}
tool_calls_without_results = [
tool_call
for tool_call in all_tool_calls
if tool_call["id"] not in tool_call_ids_with_results
]
if not tool_calls_without_results:
return

error_message = create_error_message(
message="Found AIMessages with tool_calls that do not have a corresponding ToolMessage. "
f"Here are the first few of those tool calls: {tool_calls_without_results[:3]}.\n\n"
"Every tool call (LLM requesting to call a tool) in the message history MUST have a corresponding ToolMessage "
"(result of a tool invocation to return to the LLM) - this is required by most LLM providers.",
error_code=ErrorCode.INVALID_CHAT_HISTORY,
)
raise ValueError(error_message)


@deprecated_parameter("messages_modifier", "0.1.9", "state_modifier", removal="0.3.0")
def create_react_agent(
model: LanguageModelLike,
Expand Down Expand Up @@ -530,6 +562,7 @@ def should_continue(state: AgentState) -> Literal["tools", "__end__"]:

# Define the function that calls the model
def call_model(state: AgentState, config: RunnableConfig) -> AgentState:
_validate_chat_history(state["messages"])
response = model_runnable.invoke(state, config)
has_tool_calls = isinstance(response, AIMessage) and response.tool_calls
all_tools_return_direct = (
Expand Down Expand Up @@ -566,6 +599,7 @@ def call_model(state: AgentState, config: RunnableConfig) -> AgentState:
return {"messages": [response]}

async def acall_model(state: AgentState, config: RunnableConfig) -> AgentState:
_validate_chat_history(state["messages"])
response = await model_runnable.ainvoke(state, config)
has_tool_calls = isinstance(response, AIMessage) and response.tool_calls
all_tools_return_direct = (
Expand Down
66 changes: 66 additions & 0 deletions libs/langgraph/tests/test_prebuilt.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
create_react_agent,
tools_condition,
)
from langgraph.prebuilt.chat_agent_executor import _validate_chat_history
from langgraph.prebuilt.tool_node import (
TOOL_CALL_ERROR_TEMPLATE,
InjectedState,
Expand Down Expand Up @@ -378,6 +379,71 @@ def tool2(some_val: int) -> str:
create_react_agent(model.bind_tools([tool1]), [tool2])


def test__validate_messages():
# empty input
_validate_chat_history([])

# single human message
_validate_chat_history(
[
HumanMessage(content="What's the weather?"),
]
)

# human + AI
_validate_chat_history(
[
HumanMessage(content="What's the weather?"),
AIMessage(content="The weather is sunny and 75°F."),
]
)

# Answered tool calls
_validate_chat_history(
[
HumanMessage(content="What's the weather?"),
AIMessage(
content="Let me check that for you.",
tool_calls=[{"id": "call1", "name": "get_weather", "args": {}}],
),
ToolMessage(content="Sunny, 75°F", tool_call_id="call1"),
AIMessage(content="The weather is sunny and 75°F."),
]
)

# Unanswered tool calls
with pytest.raises(ValueError):
_validate_chat_history(
[
AIMessage(
content="I'll check that for you.",
tool_calls=[
{"id": "call1", "name": "get_weather", "args": {}},
{"id": "call2", "name": "get_time", "args": {}},
],
)
]
)

with pytest.raises(ValueError):
_validate_chat_history(
[
HumanMessage(content="What's the weather and time?"),
AIMessage(
content="I'll check that for you.",
tool_calls=[
{"id": "call1", "name": "get_weather", "args": {}},
{"id": "call2", "name": "get_time", "args": {}},
],
),
ToolMessage(content="Sunny, 75°F", tool_call_id="call1"),
AIMessage(
content="The weather is sunny and 75°F. Let me check the time."
),
]
)


def test__infer_handled_types() -> None:
def handle(e): # type: ignore
return ""
Expand Down
Loading