[WIP] Add Initial Support for Instrumenting OpenAI Python Library - Chat Completion Create #2759

karthikscale3 · 2024-07-31T07:14:26Z

Description

This PR adds support for tracing the official python openai library.

This pull request introduces initial support for instrumenting the OpenAI Python library, specifically targeting the chat.completion.create method. This implementation aligns with the GenAI semantic conventions.

Fixes #1796

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

This is a work in progress PR at the moment and I plan to add unit tests and update this space.

Does This PR Require a Core Repo Change?

Yes. - Link to PR:
No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

xrmx

Added a first round of comments, thanks!

instrumentation/opentelemetry-instrumentation-openai/README.rst

instrumentation/opentelemetry-instrumentation-openai/pyproject.toml

xrmx · 2024-07-31T08:16:59Z

instrumentation/opentelemetry-instrumentation-openai/pyproject.toml

+build-backend = "hatchling.build"
+
+[project]
+name = "opentelemetry-instrumentation-openai"


Unfortunately this is already taken and we have to sort out how to handle that

this could be really tricky, do we have flexibility regarding naming or should we be following a certain convention for naming?

The convention for naming would suggest this name unfortunately 😓

I suggest opentelemetry-instrumentation-openai-v2 and the following plan:

we'll start with this name and evolve it along with semconv to have feature parity with opentelemetry-instrumentation-openai

we'll need to check with @nirga, but based on the previous discussions they should, at some point, be able to publish opentelemetry-instrumentation-openai package from OTel

when Traceloop and we are ready, we'll start publishing opentelemetry-instrumentation-openai v2 from OTel and stop publishing opentelemetry-instrumentation-openai-v2.

This would let us unblock this work and will give traceloop time to migrate. All the details need further discussion (and of course name I came up with is almost random - any other suggestions are welcome).

...on/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/__init__.py

xrmx · 2024-07-31T08:24:01Z

...telemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/span_attributes.py

+from pydantic import BaseModel, ConfigDict, Field
+
+
+class SpanAttributes:


You should use the attributes from the semantic conventions package instead of duplicating them here, see https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-semantic-conventions/src/opentelemetry/semconv/_incubating/attributes/gen_ai_attributes.py

It's fine to add here the missing ones

...on/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/__init__.py

instrumentation/opentelemetry-instrumentation-openai/pyproject.toml

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

xrmx · 2024-07-31T08:39:04Z

instrumentation/opentelemetry-instrumentation-openai/test-requirements.txt

@@ -0,0 +1,8 @@
+openai==1.37.1


You need to add all the openai dependencies to keep things reproducible

xrmx · 2024-07-31T09:09:03Z

...telemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/span_attributes.py

+    RESPONSE = "response"
+
+
+class LLMSpanAttributes(BaseModel):


Could you please elaborate a bit on why we need pydantic for this?

it's mainly an extra layer of validation, so on patch level what happens is the following

prepare an object containing span attributes needed

validate that required fields are present for ex gen_ai.operation.name & gen_ai.request.model

save those attributes upon successful validation

Wouldn't we validate it with tests? If something fails in runtime our validation would not really help anyone.

...ion/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/version.py

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

…ntrib into openai-opentelemetry

linux-foundation-easycla · 2024-08-07T10:15:03Z

The committers listed above are authorized under a signed CLA.

✅ login: alizenhom / name: Ali Waleed (bb97ec9, 6383978, e1bca1a, e15d443, 94c10f4, 1efdfcd, 892d388, d04edad, e601f6d, e7398a2, ec3c320, b8dde6c, 8495a24, 71aaeb6, 42370a7, 885b7fd, df3fc62, 706c6f2, 6ac04cb, c5ef8c3, 452d41a, d52460e)
✅ login: karthikscale3 / name: Karthik Kalyanaraman (52a5f07, c7b3c97)

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

lmolkova · 2024-08-08T21:20:55Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

+    ):
+        set_span_attribute(
+            span,
+            SpanAttributes.LLM_SYSTEM_FINGERPRINT,


this is not in semconv, could you please send the PR to add it there?

@lmolkova open-telemetry/semantic-conventions#1355

lmolkova · 2024-08-08T21:22:00Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

+
+def set_span_attributes(span: Span, attributes: dict):
+    for field, value in attributes.model_dump(by_alias=True).items():
+        set_span_attribute(span, field, value)


which attributes it will produce? what would be their names?

we should record attributes documented in the spec https://github.com/open-telemetry/semantic-conventions/blob/v1.27.0/docs/gen-ai/gen-ai-spans.md#genai-attributes

@lmolkova These are coming from LLMSpanAttributes feed with SpanAttributes from https://github.com/open-telemetry/opentelemetry-python-contrib/pull/2759/files#diff-0042eaf389fe22a3c3045eda672af6ece462798488c3c3712e99b0727a114f45R21

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

lmolkova · 2024-08-08T21:24:45Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

+    span.add_event(
+        name=SpanAttributes.LLM_CONTENT_PROMPT,
+        attributes={
+            SpanAttributes.LLM_PROMPTS: prompt,


it should be an event

lmolkova · 2024-08-08T21:26:58Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

+        if usage is not None:
+            set_span_attribute(
+                span,
+                SpanAttributes.LLM_USAGE_PROMPT_TOKENS,


please use https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-semantic-conventions/src/opentelemetry/semconv/_incubating/attributes/gen_ai_attributes.py -

SpanAttributes.LLM_USAGE_PROMPT_TOKENS is deprecated both in the spec and in the semconv package.

We might need to update to the latest semconv package version to use all new attributes.

lmolkova

Great start, thanks a lot for doing this!

1. clean up prompt calculations to be deduced from last streaming chunk 2. save correct span name 3. remove recording exceptions and setting status to ok 4. remove saving stream chunks in events

instrumentation/opentelemetry-instrumentation-openai/pyproject.toml

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/utils.py

…ween external and otel instrumentations (#4187) Some package managers (PyPi) don't provide means to reserve namespaces for projects. We also have a number of **external** instrumentation libraries in python that follow current guidance and use `opentelemetry-instrumentation-{component}` naming pattern. These libraries are hard to distinguish from otel-authored ones. Also, when someone (legitimately following existing guidance) creates an external instrumentation package like this, it blocks our ability to have OTel-authored instrumentation with this 'good' name. See open-telemetry/opentelemetry-python-contrib#2759 (comment) for real-life example. ## Changes This PR changes the recommendation to: - otel authored instrumentation should use `opentelemetry-instrumentation-*` pattern - external instrumentation should not use this pattern and should prefix lib name with their company/project/etc name * ~~[ ] Related issues #~~ * ~~[ ] Related [OTEP(s)](https://github.com/open-telemetry/oteps) #~~ * ~~[ ] Links to the prototypes (when adding or changing features)~~ * [x] [`CHANGELOG.md`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/CHANGELOG.md) file updated for non-trivial changes * ~~[ ] [`spec-compliance-matrix.md`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md) updated if necessary~~ --------- Co-authored-by: Armin Ruech <[email protected]>

…emetry-python-contrib into openai-opentelemetry

...on/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/__init__.py

lmolkova · 2024-09-06T00:32:00Z

...on/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/__init__.py

+    def _instrument(self, **kwargs):
+        """Enable OpenAI instrumentation."""
+        tracer_provider = kwargs.get("tracer_provider")
+        tracer = get_tracer(__name__, "", tracer_provider)


nit

Suggested change

tracer = get_tracer(__name__, "", tracer_provider)

tracer = get_tracer(__name__, "", tracer_provider, schema_url=Schemas.V1_27_0)

lmolkova · 2024-09-06T00:32:37Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

+        span = tracer.start_span(
+            name=span_name,
+            kind=SpanKind.CLIENT,
+            context=set_span_in_context(trace.get_current_span()),


there should be no need, this is the default behavior

Suggested change

context=set_span_in_context(trace.get_current_span()),

agree, will remove

lmolkova · 2024-09-06T00:34:45Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

+
+        except Exception as error:
+            span.set_status(Status(StatusCode.ERROR, str(error)))
+            span.set_attribute("error.type", error.__class__.__name__)


from opentelemetry.semconv.attributes import error_attributes

Suggested change

span.set_attribute("error.type", error.__class__.__name__)

span.set_attribute(error_attributes.ERROR_TYPE, type(error).__qualname__)

error.type should be fully qualified

agree will fix

lmolkova · 2024-09-06T00:39:12Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

+    set_span_attribute(
+        span, GenAIAttributes.GEN_AI_RESPONSE_MODEL, result.model
+    )
+    print(result)


lmolkova · 2024-09-06T00:41:03Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

+            if choice.finish_reason:
+                set_span_attribute(
+                    span,
+                    GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS,


this should be an array of all reasons returned

finish_reasons = [] for choice in choices: finish_reasons.append(choice.finish_reason or "error") set_span_attribute(span, GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS, finish_reasons)

lmolkova · 2024-09-06T00:50:11Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/utils.py

+        or kwargs.get("k")
+        or kwargs.get("top_k")


openai does not have top_k or k parameters, do we need it?

nope, will be removed

lmolkova · 2024-09-06T00:51:23Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/utils.py

+            else None
+        )
+    top_k = (
+        kwargs.get("n")


n represents number of choices, while Claude has top_k which is " Only sample from the top K options for each subsequent token." - they don't seem to be related.

lmolkova · 2024-09-06T00:51:44Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/utils.py

+        GenAIAttributes.GEN_AI_SYSTEM: GenAIAttributes.GenAiSystemValues.OPENAI.value,
+        GenAIAttributes.GEN_AI_REQUEST_MODEL: model or kwargs.get("model"),
+        GenAIAttributes.GEN_AI_REQUEST_TEMPERATURE: kwargs.get("temperature"),
+        GenAIAttributes.GEN_AI_REQUEST_TOP_K: top_k,


see my previous comments about top_k

Suggested change

GenAIAttributes.GEN_AI_REQUEST_TOP_K: top_k,

lmolkova · 2024-09-06T00:52:08Z

...ion/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/version.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+__version__ = "0.48b0.dev"


I think it should be

Suggested change

__version__ = "0.48b0.dev"

__version__ = "0.49b0.dev"

now

lmolkova · 2024-09-06T00:55:39Z

instrumentation/opentelemetry-instrumentation-openai/tests/__init__.py

we really need some tests :) happy to help with additional test cases if you can create the first happy case test with the tool you mentioned that uses recorded content.

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

lmolkova · 2024-09-10T03:36:05Z

...on/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/__init__.py

+            __name__,
+            "",
+            tracer_provider,
+            schema_url="https://opentelemetry.io/schemas/1.27.0",


why not

from opentelemetry.semconv.schemas import Schemas ... schema_url=Schemas.V1_27_0

?

lmolkova · 2024-09-10T03:44:38Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

+            if exc_type is not None:
+                self.span.set_status(Status(StatusCode.ERROR, str(exc_val)))
+                self.span.set_attribute(
+                    ErrorAttributes.ERROR_TYPE, exc_type.__name__


Suggested change

ErrorAttributes.ERROR_TYPE, exc_type.__name__

ErrorAttributes.ERROR_TYPE, exc_type.__qualname__

lmolkova · 2024-09-10T03:46:46Z

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py

+    def __exit__(self, exc_type, exc_val, exc_tb):
+        try:
+            if exc_type is not None:
+                self.span.set_status(Status(StatusCode.ERROR, str(exc_val)))


nit: consider moving error status reporting to a helper method

lmolkova · 2024-09-10T03:48:30Z

...telemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/span_attributes.py

+    RESPONSE = "response"
+
+
+class LLMSpanAttributes(BaseModel):


Wouldn't we validate it with tests? If something fails in runtime our validation would not really help anyone.

lmolkova · 2024-09-10T03:49:02Z

...telemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/span_attributes.py

+    )
+    gen_ai_usage_total_tokens: Optional[float] = Field(
+        None,
+        alias="gen_ai.usage.total_tokens",


please remove unused attributes

truptiparkar7 · 2024-09-18T16:45:55Z

@karthikscale3 This is great! thanks for starting this! We are interested in using this as well. How can we prioritize it to be approved soon? Also are there any documentation/ readme with same example & steps on how users can use this

karthikscale3 and others added 2 commits July 31, 2024 00:09

[WIP] Initial commit for OpenAI instrumentation

c7b3c97

Merge branch 'main' into openai-opentelemetry

52a5f07

karthikscale3 marked this pull request as draft July 31, 2024 07:15

xrmx reviewed Jul 31, 2024

View reviewed changes

...ion/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/version.py Outdated Show resolved Hide resolved

xrmx reviewed Jul 31, 2024

View reviewed changes

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py Outdated Show resolved Hide resolved

xrmx reviewed Jul 31, 2024

View reviewed changes

Merge branch 'main' of github.com:Scale3-Labs/opentelemetry-python-co…

6383978

…ntrib into openai-opentelemetry

lmolkova reviewed Aug 8, 2024

View reviewed changes

alizenhom added 8 commits August 12, 2024 10:24

Loosen openai version for instrumentation + linting

e1bca1a

fix wrong patch.py import

94c10f4

add missing dependecies tiktoken & pydantic

bb97ec9

remove async support from StreamWrapper until further notice

e15d443

addressing comments:

1efdfcd

1. clean up prompt calculations to be deduced from last streaming chunk 2. save correct span name 3. remove recording exceptions and setting status to ok 4. remove saving stream chunks in events

Merge branch 'open-telemetry:main' into openai-opentelemetry

892d388

Merge branch 'open-telemetry:main' into openai-opentelemetry

e7398a2

Merge branch 'open-telemetry:main' into openai-opentelemetry

e601f6d

lmolkova mentioned this pull request Aug 16, 2024

Update instrumentation library naming guidance to avoid conflicts between external and otel instrumentations open-telemetry/opentelemetry-specification#4187

Merged

1 task

xrmx reviewed Aug 19, 2024

View reviewed changes

instrumentation/opentelemetry-instrumentation-openai/pyproject.toml Outdated Show resolved Hide resolved

xrmx reviewed Aug 19, 2024

View reviewed changes

lzchen added the gen-ai Related to generative AI label Aug 28, 2024

alizenhom added 3 commits September 5, 2024 11:05

Merge branch 'open-telemetry:main' into openai-opentelemetry

df3fc62

Refactoring Openai instrumentation

d04edad

Merge branch 'openai-opentelemetry' of github.com:Scale3-Labs/opentel…

b8dde6c

…emetry-python-contrib into openai-opentelemetry

xrmx reviewed Sep 5, 2024

View reviewed changes

...on/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/__init__.py Outdated Show resolved Hide resolved

remove SpanAttributes and refactor streamwrapper

ec3c320

lmolkova reviewed Sep 6, 2024

View reviewed changes

alizenhom added 4 commits September 6, 2024 15:00

change instrumentation name & fix some nits

706c6f2

change openai package name

71aaeb6

cleanup setting prompt events & finish reasons

8495a24

catch connection drops and reraise error in streaming

885b7fd

xrmx reviewed Sep 6, 2024

View reviewed changes

...ation/opentelemetry-instrumentation-openai/src/opentelemetry/instrumentation/openai/patch.py Outdated Show resolved Hide resolved

alizenhom added 5 commits September 6, 2024 17:59

run tox -e generate

6ac04cb

run linter

42370a7

run tox -e generate

c5ef8c3

add changelog

d52460e

test requirments + tox ini

452d41a

alizenhom force-pushed the openai-opentelemetry branch from 6b453e1 to 452d41a Compare September 9, 2024 10:51

lmolkova reviewed Sep 10, 2024

View reviewed changes

		from pydantic import BaseModel, ConfigDict, Field


		class SpanAttributes:

	tracer = get_tracer(__name__, "", tracer_provider)
	tracer = get_tracer(__name__, "", tracer_provider, schema_url=Schemas.V1_27_0)

	span.set_attribute("error.type", error.__class__.__name__)
	span.set_attribute(error_attributes.ERROR_TYPE, type(error).__qualname__)

	ErrorAttributes.ERROR_TYPE, exc_type.__name__
	ErrorAttributes.ERROR_TYPE, exc_type.__qualname__

[WIP] Add Initial Support for Instrumenting OpenAI Python Library - Chat Completion Create #2759

Are you sure you want to change the base?

[WIP] Add Initial Support for Instrumenting OpenAI Python Library - Chat Completion Create #2759

Conversation

karthikscale3 commented Jul 31, 2024 • edited Loading

Description

Type of change

How Has This Been Tested?

Does This PR Require a Core Repo Change?

Checklist:

xrmx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xrmx Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linux-foundation-easycla bot commented Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

truptiparkar7 commented Sep 18, 2024

karthikscale3 commented Jul 31, 2024 •

edited

Loading

xrmx Jul 31, 2024 •

edited

Loading

linux-foundation-easycla bot commented Aug 7, 2024 •

edited

Loading

lmolkova Sep 10, 2024 •

edited

Loading