feat(llmobs): submit langchain tool.invoke tool spans #10410

yahya-mouman · 2024-08-27T16:52:01Z

This PR adds instrumentation to submit langchain tool.invoke/ainvoke tool spans to APM and LLM Observability.

The tool spans sent to APM will contain the following I/O data and tags :
langchain.request.tool.name
langchain.request.tool.description
langchain.request.tool.metadata
langchain.request.tool.tags
langchain.request.input
langchain.request.config
langchain.response.output

The tool spans sent to LLM Observability will contain the following I/O data:
input.value: a text field annotating the tool input
output.value: The output of the tool (can be any type)

Ticket : https://datadoghq.atlassian.net/browse/MLOB-935

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2024-08-27T16:52:43Z

CODEOWNERS have been resolved as:

releasenotes/notes/feat-llmobs-submit-langchain-tool-spans-d7274ae79d3808e9.yaml  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_invoke.json  @DataDog/apm-python
ddtrace/contrib/internal/langchain/patch.py                             @DataDog/ml-observability
ddtrace/llmobs/_integrations/langchain.py                               @DataDog/ml-observability
tests/contrib/langchain/test_langchain_community.py                     @DataDog/ml-observability
tests/contrib/langchain/test_langchain_llmobs.py                        @DataDog/ml-observability

datadog-dd-trace-py-rkomorn · 2024-08-27T17:07:21Z

Datadog Report

Branch report: yahya/trace-base-tool
Commit report: 7ec77b8
Test service: dd-trace-py

❌ 3 Failed (0 Known Flaky), 8351 Passed, 17030 Skipped, 1h 29m 22.5s Total Time

❌ Failed Tests (3)

test_base_tool_ainvoke - test_langchain_community.py - Details

Expand for error

 At request <Request GET /test/session/snapshot >:
    At snapshot (token='tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke'):
     - Directory: /snapshots
     - CI mode: 0
     - Trace File: /snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke.json
     - Stats File: /snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke_tracestats.json
     At compare of 1 expected trace(s) to 1 received trace(s):
      At trace 'langchain.request' (2 spans):
 Received more spans (3) than expected (2). Received unmatched spans: 'langchain.request'

test_base_tool_ainvoke - test_langchain_community.py - Details

Expand for error

 At request <Request GET /test/session/snapshot >:
    At snapshot (token='tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke'):
     - Directory: /snapshots
     - CI mode: 0
     - Trace File: /snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke.json
     - Stats File: /snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke_tracestats.json
     At compare of 1 expected trace(s) to 1 received trace(s):
      At trace 'langchain.request' (2 spans):
 Received fewer spans (1) than expected (2). Expected unmatched spans: 'langchain.request'

test_base_tool_invoke - test_langchain_community.py - Details

Expand for error

 At request <Request GET /test/session/snapshot >:
    At snapshot (token='tests.contrib.langchain.test_langchain_community.test_base_tool_invoke'):
     - Directory: /snapshots
     - CI mode: 0
     - Trace File: /snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_invoke.json
     - Stats File: /snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_invoke_tracestats.json
     At compare of 1 expected trace(s) to 1 received trace(s):
      At trace 'langchain.request' (1 spans):
 Received more spans (4) than expected (1). Received unmatched spans: 'langchain.request', 'langchain.request', 'langchain.request'

yahya-mouman · 2024-08-27T17:09:16Z

ddtrace/contrib/internal/langchain/patch.py

+                )
+            if tool_input:
+                span.set_tag_str("langchain.request.input", integration.trunc(str(tool_input)))
+            if config:


Is this enough or should I parse the configs one by one and add them

We don't need to truncate config as that isn't a direct I/O to the tool

We can probably just json dump the config as it is a TypedDict (see source)

yahya-mouman · 2024-08-27T17:09:37Z

ddtrace/contrib/internal/langchain/patch.py

+                attrs={
+                    # Comment : Should we have more info in the log ?
+                    "tool_name": instance.__self__.name or "",
+                    "input": tool_input,
+                    "config": config or "",
+                },


Should we have more info in the log ?

yahya-mouman · 2024-08-27T17:09:49Z

ddtrace/contrib/internal/langchain/patch.py

+
+
+@with_traced_module
+async def traced_base_tool_ainvoke(langchain, pin, func, instance, args, kwargs):


Is there a better way to trace the async variant besides duplicating the code ?

Unfortunately nope 😢 I think there's a couple ways to depollute the patch module (extract tracing functions to separate files, reuse common code in the sync/async functions) but we can keep this for now.

Same comments apply to the async function as above

yahya-mouman · 2024-08-27T17:11:16Z

ddtrace/contrib/internal/langchain/patch.py

@@ -965,6 +965,170 @@ def traced_similarity_search(langchain, pin, func, instance, args, kwargs):
    return documents


+@with_traced_module
+def traced_base_tool_invoke(langchain, pin, func, instance, args, kwargs):


I couldn't think of any metrics to emit from this function

Are you talking about datadog metrics? If so we can ignore, that's out of scope for our team and this integration.

Don't we use any metrics from those emitted from the langchain integration ?

pr-commenter · 2024-08-27T17:41:34Z

Benchmarks

Benchmark execution time: 2024-08-30 13:17:27

Comparing candidate commit 34096be in PR branch yahya/trace-base-tool with baseline commit 5b394f1 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 353 metrics, 47 unstable metrics.

Yun-Kim

Let's split out the APM integration and LLMObs integration into separate PRs for easier readability.

ddtrace/contrib/internal/langchain/patch.py

Yun-Kim · 2024-08-27T23:06:41Z

ddtrace/contrib/internal/langchain/patch.py

+
+    span = integration.trace(
+        pin,
+        "%s.%s.%s" % (func.__module__, func.__class__.__name__, func.__self__.__class__.__name__),


I think here I'd prefer to just use the tool class name instead of the entire module/path for simplicity/clarity. WDYT?

Yun-Kim · 2024-08-27T23:08:53Z

ddtrace/contrib/internal/langchain/patch.py

@@ -965,6 +965,170 @@ def traced_similarity_search(langchain, pin, func, instance, args, kwargs):
    return documents


+@with_traced_module
+def traced_base_tool_invoke(langchain, pin, func, instance, args, kwargs):


Are you talking about datadog metrics? If so we can ignore, that's out of scope for our team and this integration.

Yun-Kim · 2024-08-27T23:11:04Z

ddtrace/contrib/internal/langchain/patch.py

+    tool_output = None
+    try:
+        if instance.name:
+            span.set_tag_str("langchain.request.tool.name", integration.trunc(str(instance.name)))


Truncation is only ever performed on input/outputs, so no need to do it here

Yun-Kim · 2024-08-27T23:12:16Z

ddtrace/contrib/internal/langchain/patch.py

+    try:
+        if instance.name:
+            span.set_tag_str("langchain.request.tool.name", integration.trunc(str(instance.name)))
+        if integration.is_pc_sampled_span(span):


pc_sampling refers to prompt/completion sampling, or inputs/outputs in this case. We can safely tag most of the below items without needing to check is_pc_sampled_span(...), only need this check for input and the return value.

Yun-Kim · 2024-08-27T23:24:17Z

ddtrace/contrib/internal/langchain/patch.py

+
+
+@with_traced_module
+async def traced_base_tool_ainvoke(langchain, pin, func, instance, args, kwargs):


Same comments apply to the async function as above

releasenotes/notes/feat-llmobs-submit-langchain-tool-spans-d7274ae79d3808e9.yaml

Yun-Kim · 2024-08-27T23:42:13Z

tests/contrib/langchain/test_langchain_community.py

+    from math import pi
+
+    cassette_name = "langchain_tool_invoke.yaml"
+    with request_vcr.use_cassette(cassette_name):


AFAIK tools can be called without a network connection. Why do we need a cassette for this?

Yes gotcha ! Was thinking of adding the tool to an LLM instead opted for a simple test. I did remove it in an unpushed commit. Thanks !
I still left the option to call invoke with a cassette in llmobs side for any future agent test to be made.

Yun-Kim · 2024-08-27T23:42:45Z

tests/contrib/langchain/test_langchain_llmobs.py

+    def _invoke_tool(cls, tool, tool_input, mock_tracer, cassette_name):
+        LLMObs.enable(ml_app=cls.ml_app, integrations_enabled=False, _tracer=mock_tracer)
+        with get_request_vcr(subdirectory_name=cls.cassette_subdirectory_name).use_cassette(cassette_name):
+            if LANGCHAIN_VERSION > (0, 1):


Is there any particular reason we're version gating this here?

Note, you'll only need to trace BaseTool.invoke() for langchain V1 and up.

In the ticket description you mentioned this so I was trying to make sure it's the case. I'm guessing since it's in langchain_community test suite it's always going to be the case ?

Yun-Kim · 2024-08-27T23:43:23Z

tests/contrib/langchain/test_langchain_llmobs.py

+            def _arun(self, radius: int):
+                raise NotImplementedError("This tool does not support async")
+
+        cassette_name = "langchain_tool_invoke_39.yaml" if PY39 else "langchain_tool_invoke.yaml"


same here, not sure if we need a cassette file for this test.

datadog-datadog-prod-us1 · 2024-08-28T18:06:28Z

ddtrace/llmobs/_integrations/langchain.py

+        self,
+        span: Span,
+        tool_input: Union[str, Dict[str, Any], Any],
+        tool_output: Any,


🟠 Code Quality Violation

do not use Any, use a concrete type (...read more)

Use the Any type very carefully. Most of the time, the Any type is used because we do not know exactly what type is being used. If you want to specify that a value can be of any type, use object instead of Any.

Learn More

Python documentation: the Any type

Co-authored-by: Yun Kim <[email protected]>

…74ae79d3808e9.yaml Co-authored-by: Yun Kim <[email protected]>

…a/trace-base-tool

# Conflicts: # ddtrace/llmobs/_integrations/langchain.py # tests/contrib/langchain/test_langchain_llmobs.py

romainkomorndatadog · 2024-08-30T08:06:25Z

tests/contrib/langchain/test_langchain_community.py

+    result in a 1-span trace with a tool span.
+    """
+    if langchain_core is None:
+        pytest.skip("langchain-core not installed which is required for this test.")


very small nit: I think it's ever-so-slightly preferable to use a skipif mark in this case.

I agree but the issue is that this specific dependency isn't needed on all tests and if I understand correctly, fixtures are not available on the level of skipif annotations.
So in order to have it available there i'll need to import it on the whole file instead. Which is not very optimal since it's only used on a couple tests.

Do you know of any workaround for this ?

I don't know of a workaround if you want to keep using langchain_core as a fixture, no.

My personal taste is that it's more traceable to import langchain_core in this module and skipif if it's absent, than to go sort out why the langchain_core fixture is None.

Looks like there are ~10 tests that use the langchain_core fixture, so it looks like it's going to get loaded in just about all CI cases anyway?

Anyway, it's a very small nit. :)

Yeah i completely understand your point.
I believe I'll just keep it as it is right now since I haven't authored the file before this PR and wouldn't want to change the existing structure until I speak to the original authors.
Thanks for giving it a look Romain ! 🙇

yahya-mouman · 2024-09-02T13:07:00Z

Split to these two PRs
PR 1 : #10476
PR 2 (depends on PR1) : #10477

yahya-mouman added 4 commits August 27, 2024 17:28

Patch and trace tool.invoke and tool.ainvoke

3a305a2

Test tracing for tool invoke and ainvoke

4ec344d

Add llmobs tracing for tool_invoke

93def52

Add test for llmobs tool invoke

97cfe68

Add changelog

d6a08a0

yahya-mouman commented Aug 27, 2024

View reviewed changes

Remove comments

dc4f63c

Yun-Kim reviewed Aug 27, 2024

View reviewed changes

yahya-mouman added 2 commits August 28, 2024 20:04

Fix tests

839f300

Generate snapshots

ee940db

datadog-datadog-prod-us1 bot reviewed Aug 28, 2024

View reviewed changes

yahya-mouman and others added 6 commits August 28, 2024 20:09

Update LLMObs test

e1eb5d2

add cassette support in _invoke_tool

024ce03

Update ddtrace/contrib/internal/langchain/patch.py

85d5f80

Co-authored-by: Yun Kim <[email protected]>

Update releasenotes/notes/feat-llmobs-submit-langchain-tool-spans-d72…

46e4e06

…74ae79d3808e9.yaml Co-authored-by: Yun Kim <[email protected]>

Address comments on PR

aefc770

Merge remote-tracking branch 'origin/yahya/trace-base-tool' into yahy…

09c5ae2

…a/trace-base-tool

yahya-mouman marked this pull request as ready for review August 28, 2024 18:42

yahya-mouman requested review from a team as code owners August 28, 2024 18:42

yahya-mouman requested review from sanchda and romainkomorndatadog August 28, 2024 18:42

yahya-mouman added 2 commits August 28, 2024 20:45

Merge branch 'refs/heads/main' into yahya/trace-base-tool

34ba377

# Conflicts: # ddtrace/llmobs/_integrations/langchain.py # tests/contrib/langchain/test_langchain_llmobs.py

Rename span

f5481cd

yahya-mouman added 2 commits August 29, 2024 18:37

Merge branch 'refs/heads/main' into yahya/trace-base-tool

26aa89b

Merge branch 'refs/heads/main' into yahya/trace-base-tool

34096be

romainkomorndatadog reviewed Aug 30, 2024

View reviewed changes

Update feat-llmobs-submit-langchain-tool-spans-d7274ae79d3808e9.yaml

7ec77b8

yahya-mouman closed this Sep 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llmobs): submit langchain tool.invoke tool spans #10410

feat(llmobs): submit langchain tool.invoke tool spans #10410

yahya-mouman commented Aug 27, 2024 •

edited

Loading

github-actions bot commented Aug 27, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Aug 27, 2024 •

edited

Loading

yahya-mouman Aug 27, 2024

Yun-Kim Aug 27, 2024

yahya-mouman Aug 27, 2024

yahya-mouman Aug 27, 2024

Yun-Kim Aug 27, 2024

Yun-Kim Aug 27, 2024

yahya-mouman Aug 27, 2024

Yun-Kim Aug 27, 2024

yahya-mouman Aug 28, 2024

pr-commenter bot commented Aug 27, 2024 •

edited

Loading

Yun-Kim left a comment

Yun-Kim Aug 27, 2024

Yun-Kim Aug 27, 2024

Yun-Kim Aug 27, 2024

Yun-Kim Aug 27, 2024

Yun-Kim Aug 27, 2024

Yun-Kim Aug 27, 2024

yahya-mouman Aug 28, 2024

Yun-Kim Aug 27, 2024

yahya-mouman Aug 28, 2024 •

edited

Loading

Yun-Kim Aug 27, 2024

datadog-datadog-prod-us1 bot Aug 28, 2024

romainkomorndatadog Aug 30, 2024

yahya-mouman Sep 2, 2024

romainkomorndatadog Sep 2, 2024

yahya-mouman Sep 2, 2024

yahya-mouman commented Sep 2, 2024



		@with_traced_module
		async def traced_base_tool_ainvoke(langchain, pin, func, instance, args, kwargs):

feat(llmobs): submit langchain tool.invoke tool spans #10410

feat(llmobs): submit langchain tool.invoke tool spans #10410

Conversation

yahya-mouman commented Aug 27, 2024 • edited Loading

Checklist

Reviewer Checklist

github-actions bot commented Aug 27, 2024 • edited Loading

datadog-dd-trace-py-rkomorn bot commented Aug 27, 2024 • edited Loading

Datadog Report

❌ Failed Tests (3)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pr-commenter bot commented Aug 27, 2024 • edited Loading

Benchmarks

Yun-Kim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yahya-mouman Aug 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datadog-datadog-prod-us1 bot Aug 28, 2024

Choose a reason for hiding this comment

🟠 Code Quality Violation

Learn More

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yahya-mouman commented Sep 2, 2024

yahya-mouman commented Aug 27, 2024 •

edited

Loading

github-actions bot commented Aug 27, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Aug 27, 2024 •

edited

Loading

pr-commenter bot commented Aug 27, 2024 •

edited

Loading

yahya-mouman Aug 28, 2024 •

edited

Loading