Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(llmobs): add retrieval and embedding spans [backport 2.9] #9358

Merged
merged 1 commit into from
May 22, 2024

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented May 22, 2024

Backport d10e081 from #9134 to 2.9.

This PR adds support for submitting embedding and retrieval type spans for LLM Observability, both via LLMObs.{retrieval/embedding} and @ddtrace.llmobs.decorators.{retrieval/embedding}.
Additionally, this PR adds a public helper class ddtrace.llmobs.utils.Documents for users to create SDK-compatible input/output annotation objects for Embedding/Retrieval spans.

Embedding spans require a model name to be set, and also optionally accepts model provider values (will default to custom). Embedding spans can be annotated with:

  • input: strings, dictionaries, or a list of dictionaries, which will be cast as Documents when submitted to LLMObs.
  • output: strings or any JSON serializable value.

Retrieval spans can be annotated with:

  • input strings or any JSON serializable value.
  • output strings, dictionaries, or a list of dictionaries, which will be cast as Documents when submitted to LLMObs.

This PR also introduces a class of type ddtrace.llmobs.utils.Documents, which can be used to convert arguments to be tagged as input/output documents. The Documents TypedDict object can contain the following fields:

  • name: str
  • id: str
  • text: str
  • score: int/float

Checklist

  • Change(s) are motivated and described in the PR description
  • Testing strategy is described if automated tests are not included in the PR
  • Risks are described (performance impact, potential for breakage, maintainability)
  • Change is maintainable (easy to change, telemetry, documentation)
  • Library release note guidelines are followed or label changelog/no-changelog is set
  • Documentation is included (in-code, generated user docs, public corp docs)
  • Backport labels are set (if applicable)
  • If this PR changes the public interface, I've notified @DataDog/apm-tees.

Reviewer Checklist

  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Description motivates each change
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Change is maintainable (easy to change, telemetry, documentation)
  • Release note makes sense to a user of the library
  • Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

This PR adds support for submitting embedding and retrieval type spans
for LLM Observability, both via `LLMObs.{retrieval/embedding}` and
`@ddtrace.llmobs.decorators.{retrieval/embedding}`.
Additionally, this PR adds a public helper class
`ddtrace.llmobs.utils.Documents` for users to create SDK-compatible
input/output annotation objects for Embedding/Retrieval spans.

Embedding spans require a model name to be set, and also optionally
accepts model provider values (will default to `custom`). Embedding
spans can be annotated with input strings, dictionaries, or a list of
dictionaries, which will be cast as `Documents` when submitted to
LLMObs. Embedding spans can be annotated with output strings or any JSON
serializable value.

Retrieval spans can be annotated with input strings or any JSON
serializable value. Retrieval spans can also be annotated with output
strings, dictionaries, or a list of dictionaries, which will be cast as
`Documents` when submitted to LLMObs.

This PR also introduces a class of type
`ddtrace.llmobs.utils.Documents`, which can be used to convert arguments
to be tagged as input/output documents. The `Documents` TypedDict object
can contain the following fields:
- `name`: str
- `id`: str
- `text`: str
- `score`: int/float

## Checklist

- [x] Change(s) are motivated and described in the PR description
- [x] Testing strategy is described if automated tests are not included
in the PR
- [x] Risks are described (performance impact, potential for breakage,
maintainability)
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed or label `changelog/no-changelog` is set
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/))
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))
- [x] If this PR changes the public interface, I've notified
`@DataDog/apm-tees`.

## Reviewer Checklist

- [x] Title is accurate
- [x] All changes are related to the pull request's stated goal
- [x] Description motivates each change
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- [x] Testing strategy adequately addresses listed risks
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] Release note makes sense to a user of the library
- [x] Author has acknowledged and discussed the performance implications
of this PR as reported in the benchmarks PR comment
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

(cherry picked from commit d10e081)
@github-actions github-actions bot requested a review from a team as a code owner May 22, 2024 23:30
@github-actions github-actions bot added changelog/no-changelog A changelog entry is not required for this PR. MLObs ML Observability (LLMObs) labels May 22, 2024
@Yun-Kim Yun-Kim enabled auto-merge (squash) May 22, 2024 23:31
@Yun-Kim Yun-Kim added changelog/no-changelog A changelog entry is not required for this PR. and removed changelog/no-changelog A changelog entry is not required for this PR. labels May 22, 2024
@Yun-Kim Yun-Kim merged commit 487ae3b into 2.9 May 22, 2024
17 checks passed
@Yun-Kim Yun-Kim deleted the backport-9134-to-2.9 branch May 22, 2024 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog A changelog entry is not required for this PR. MLObs ML Observability (LLMObs)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant