Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(llmobs): submit langchain tool.invoke tool spans #10410

Closed
wants to merge 19 commits into from

Conversation

yahya-mouman
Copy link
Contributor

@yahya-mouman yahya-mouman commented Aug 27, 2024

This PR adds instrumentation to submit langchain tool.invoke/ainvoke tool spans to APM and LLM Observability.

The tool spans sent to APM will contain the following I/O data and tags :
langchain.request.tool.name
langchain.request.tool.description
langchain.request.tool.metadata
langchain.request.tool.tags
langchain.request.input
langchain.request.config
langchain.response.output

The tool spans sent to LLM Observability will contain the following I/O data:
input.value: a text field annotating the tool input
output.value: The output of the tool (can be any type)

Ticket : https://datadoghq.atlassian.net/browse/MLOB-935

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

Copy link
Contributor

github-actions bot commented Aug 27, 2024

CODEOWNERS have been resolved as:

releasenotes/notes/feat-llmobs-submit-langchain-tool-spans-d7274ae79d3808e9.yaml  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_invoke.json  @DataDog/apm-python
ddtrace/contrib/internal/langchain/patch.py                             @DataDog/ml-observability
ddtrace/llmobs/_integrations/langchain.py                               @DataDog/ml-observability
tests/contrib/langchain/test_langchain_community.py                     @DataDog/ml-observability
tests/contrib/langchain/test_langchain_llmobs.py                        @DataDog/ml-observability

@datadog-dd-trace-py-rkomorn
Copy link

datadog-dd-trace-py-rkomorn bot commented Aug 27, 2024

Datadog Report

Branch report: yahya/trace-base-tool
Commit report: 7ec77b8
Test service: dd-trace-py

❌ 3 Failed (0 Known Flaky), 8351 Passed, 17030 Skipped, 1h 29m 22.5s Total Time

❌ Failed Tests (3)

  • test_base_tool_ainvoke - test_langchain_community.py - Details

    Expand for error
     At request <Request GET /test/session/snapshot >:
        At snapshot (token='tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke'):
         - Directory: /snapshots
         - CI mode: 0
         - Trace File: /snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke.json
         - Stats File: /snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke_tracestats.json
         At compare of 1 expected trace(s) to 1 received trace(s):
          At trace 'langchain.request' (2 spans):
     Received more spans (3) than expected (2). Received unmatched spans: 'langchain.request'
    
  • test_base_tool_ainvoke - test_langchain_community.py - Details

    Expand for error
     At request <Request GET /test/session/snapshot >:
        At snapshot (token='tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke'):
         - Directory: /snapshots
         - CI mode: 0
         - Trace File: /snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke.json
         - Stats File: /snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_ainvoke_tracestats.json
         At compare of 1 expected trace(s) to 1 received trace(s):
          At trace 'langchain.request' (2 spans):
     Received fewer spans (1) than expected (2). Expected unmatched spans: 'langchain.request'
    
  • test_base_tool_invoke - test_langchain_community.py - Details

    Expand for error
     At request <Request GET /test/session/snapshot >:
        At snapshot (token='tests.contrib.langchain.test_langchain_community.test_base_tool_invoke'):
         - Directory: /snapshots
         - CI mode: 0
         - Trace File: /snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_invoke.json
         - Stats File: /snapshots/tests.contrib.langchain.test_langchain_community.test_base_tool_invoke_tracestats.json
         At compare of 1 expected trace(s) to 1 received trace(s):
          At trace 'langchain.request' (1 spans):
     Received more spans (4) than expected (1). Received unmatched spans: 'langchain.request', 'langchain.request', 'langchain.request'
    

)
if tool_input:
span.set_tag_str("langchain.request.input", integration.trunc(str(tool_input)))
if config:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this enough or should I parse the configs one by one and add them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We don't need to truncate config as that isn't a direct I/O to the tool
  2. We can probably just json dump the config as it is a TypedDict (see source)

Comment on lines 1040 to 1045
attrs={
# Comment : Should we have more info in the log ?
"tool_name": instance.__self__.name or "",
"input": tool_input,
"config": config or "",
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have more info in the log ?



@with_traced_module
async def traced_base_tool_ainvoke(langchain, pin, func, instance, args, kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to trace the async variant besides duplicating the code ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately nope 😢 I think there's a couple ways to depollute the patch module (extract tracing functions to separate files, reuse common code in the sync/async functions) but we can keep this for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments apply to the async function as above

@@ -965,6 +965,170 @@ def traced_similarity_search(langchain, pin, func, instance, args, kwargs):
return documents


@with_traced_module
def traced_base_tool_invoke(langchain, pin, func, instance, args, kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't think of any metrics to emit from this function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about datadog metrics? If so we can ignore, that's out of scope for our team and this integration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we use any metrics from those emitted from the langchain integration ?

@pr-commenter
Copy link

pr-commenter bot commented Aug 27, 2024

Benchmarks

Benchmark execution time: 2024-08-30 13:17:27

Comparing candidate commit 34096be in PR branch yahya/trace-base-tool with baseline commit 5b394f1 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 353 metrics, 47 unstable metrics.

Copy link
Contributor

@Yun-Kim Yun-Kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's split out the APM integration and LLMObs integration into separate PRs for easier readability.

ddtrace/contrib/internal/langchain/patch.py Outdated Show resolved Hide resolved

span = integration.trace(
pin,
"%s.%s.%s" % (func.__module__, func.__class__.__name__, func.__self__.__class__.__name__),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here I'd prefer to just use the tool class name instead of the entire module/path for simplicity/clarity. WDYT?

@@ -965,6 +965,170 @@ def traced_similarity_search(langchain, pin, func, instance, args, kwargs):
return documents


@with_traced_module
def traced_base_tool_invoke(langchain, pin, func, instance, args, kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about datadog metrics? If so we can ignore, that's out of scope for our team and this integration.

tool_output = None
try:
if instance.name:
span.set_tag_str("langchain.request.tool.name", integration.trunc(str(instance.name)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truncation is only ever performed on input/outputs, so no need to do it here

try:
if instance.name:
span.set_tag_str("langchain.request.tool.name", integration.trunc(str(instance.name)))
if integration.is_pc_sampled_span(span):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pc_sampling refers to prompt/completion sampling, or inputs/outputs in this case. We can safely tag most of the below items without needing to check is_pc_sampled_span(...), only need this check for input and the return value.



@with_traced_module
async def traced_base_tool_ainvoke(langchain, pin, func, instance, args, kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments apply to the async function as above

from math import pi

cassette_name = "langchain_tool_invoke.yaml"
with request_vcr.use_cassette(cassette_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK tools can be called without a network connection. Why do we need a cassette for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes gotcha ! Was thinking of adding the tool to an LLM instead opted for a simple test. I did remove it in an unpushed commit. Thanks !
I still left the option to call invoke with a cassette in llmobs side for any future agent test to be made.

def _invoke_tool(cls, tool, tool_input, mock_tracer, cassette_name):
LLMObs.enable(ml_app=cls.ml_app, integrations_enabled=False, _tracer=mock_tracer)
with get_request_vcr(subdirectory_name=cls.cassette_subdirectory_name).use_cassette(cassette_name):
if LANGCHAIN_VERSION > (0, 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason we're version gating this here?

Copy link
Contributor Author

@yahya-mouman yahya-mouman Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, you'll only need to trace BaseTool.invoke() for langchain V1 and up.

In the ticket description you mentioned this so I was trying to make sure it's the case. I'm guessing since it's in langchain_community test suite it's always going to be the case ?

def _arun(self, radius: int):
raise NotImplementedError("This tool does not support async")

cassette_name = "langchain_tool_invoke_39.yaml" if PY39 else "langchain_tool_invoke.yaml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, not sure if we need a cassette file for this test.

self,
span: Span,
tool_input: Union[str, Dict[str, Any], Any],
tool_output: Any,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Code Quality Violation

do not use Any, use a concrete type (...read more)

Use the Any type very carefully. Most of the time, the Any type is used because we do not know exactly what type is being used. If you want to specify that a value can be of any type, use object instead of Any.

Learn More

View in Datadog  Leave us feedback  Documentation

@yahya-mouman yahya-mouman marked this pull request as ready for review August 28, 2024 18:42
@yahya-mouman yahya-mouman requested review from a team as code owners August 28, 2024 18:42
# Conflicts:
#	ddtrace/llmobs/_integrations/langchain.py
#	tests/contrib/langchain/test_langchain_llmobs.py
result in a 1-span trace with a tool span.
"""
if langchain_core is None:
pytest.skip("langchain-core not installed which is required for this test.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very small nit: I think it's ever-so-slightly preferable to use a skipif mark in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but the issue is that this specific dependency isn't needed on all tests and if I understand correctly, fixtures are not available on the level of skipif annotations.
So in order to have it available there i'll need to import it on the whole file instead. Which is not very optimal since it's only used on a couple tests.

Do you know of any workaround for this ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of a workaround if you want to keep using langchain_core as a fixture, no.

My personal taste is that it's more traceable to import langchain_core in this module and skipif if it's absent, than to go sort out why the langchain_core fixture is None.

Looks like there are ~10 tests that use the langchain_core fixture, so it looks like it's going to get loaded in just about all CI cases anyway?

Anyway, it's a very small nit. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i completely understand your point.
I believe I'll just keep it as it is right now since I haven't authored the file before this PR and wouldn't want to change the existing structure until I speak to the original authors.
Thanks for giving it a look Romain ! 🙇

@yahya-mouman
Copy link
Contributor Author

Split to these two PRs
PR 1 : #10476
PR 2 (depends on PR1) : #10477

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants