feat: Streaming azure openai #244

ZhongpinWang · 2024-10-25T11:46:15Z

Context

Support streaming for azure-openai client in foundation-models.

Check sample-code for usage.

For reviewer: PR is still WIP, tests are missing, linter is failing... But streaming already works. I would appreciate your opinion on the API design, overall structure, missing functionality etc. for further improvement.

Definition of Done

Code is tested (Unit, E2E)
Error handling created / updated & covered by the tests above
Documentation updated
(Optional) Aligned changes with the Java SDK
(Optional) Release notes updated

…ming-azure-openai

packages/foundation-models/src/azure-openai/azure-openai-chat-completion-stream.ts

packages/foundation-models/src/azure-openai/azure-openai-line-decoder.ts

deekshas8

Review still WIP

packages/foundation-models/src/azure-openai/azure-openai-stream.ts

...oundation-models/src/azure-openai/azure-openai-chat-completion-stream-chunk-response.test.ts

deekshas8 · 2024-10-30T12:47:00Z

packages/foundation-models/src/azure-openai/azure-openai-chat-completion-stream-response.ts

+ * Azure OpenAI chat completion stream response.
+ */
+export class AzureOpenAiChatCompletionStreamResponse<T> {
+  private _usage: AzureOpenAiCompletionUsage | undefined;


[q] Aren't these properties duplicated (I see the're part of the chunk response type as well)

I have strong reservations about introducing the additional wrapper AzureOpenAiChatCompletionStreamResponse with dup properties.
I my view, we should keep the api response simple, so just a stream response of chunks (without the wrapper). streamContent is already simplified/additional convenience.

If the user wants to access finish_reason or usage, we already provide the util functions in the AzureOpenAiChatCompletionStreamChunkResponse.
Given this, the added complexity from AzureOpenAiChatCompletionStreamResponse and the multiple *process functions does not seem worth it to me.

@MatKuhr @tomfrenken @KavithaSiva I would be interested in your thoughts

Thanks for this feedback. There are some thoughts behind introducing the wrapper.

Adding this wrapper or not, the difference for user to access the chunk is just either using for await (const chunk of response.stream) or for await (const chunk of response). This won't make the consumption more difficult.

The original reason for adding this is that we want to support streamContent(). I discussed this with @MatKuhr. In the current Java implementation, string stream is called streamChatCompletion() and has a very important role by default. When user calls this method, they do not have direct access to the original raw chunk returned by OpenAI, i.e., user has no access to finish_reason or especially token_usage. In Java, when user wants this information, they need to either consume the original stream by themselves, or for finish_reason like content_filter, they throw exceptions (in a separate thread) so that user will get notified. But we don't wanna throw errors in JS as this is inconsistent with our non-streaming api and will also stop streaming immediately. That is why I added this outside wrapper for the whole streaming, to let user at least have access to finish_reason and token_usage when using streamContent().

Having AzureOpenAiChatCompletionStreamResponse or not, we need these *process functions for other things like wrapping the original OpenAI chunk with our util wrapper AzureOpenAiChatCompletionStreamChunkResponse or transform further into string stream. It is not only used for token_usage or finish_reason.

There will be no error handling from us for different finish_reason, like when content filter stopped the stream, SDK will never know. This means each and every user literally need to implement themselves a big switch case, check for each chunk response and do error handling by themselves. One could argue that we have some util function implemented and expose them to user for checking finish_reason, but user has to know to use this function. When we know these checks are necessary and can do this in-place inside our SDK, why let user do this. And do notice that it is not even possible when using streamContent().

I hope this can clarify a bit the motivation behind and let me know if you see it differently.

Thanks for explaining the thought behind the wrapper class.

As discussed, I would suggest we don't really need a separate streamContent() api, but rather provide a convenience method in the stream response object returned by .stream(). The user can call it if they need it.

async *processContent( this: AzureOpenAiChatCompletionStream<AzureOpenAiChatCompletionStreamChunkResponse> ): AsyncGenerator<string> { ... }

The above change will allow calling it on stream object.

for await (const chunk of response.stream.processContent())

Also maybe renaming it to get instead of process?

...oundation-models/src/azure-openai/azure-openai-chat-completion-stream-chunk-response.test.ts

MatKuhr · 2024-11-04T13:30:52Z

packages/foundation-models/src/azure-openai/stream/line-decoder.ts

+ * A re-implementation of httpx's `LineDecoder` in Python that handles incrementally
+ * reading lines from text.
+ *
+ * Https://github.com/encode/httpx/blob/920333ea98118e9cf617f246905d7b202510941c/httpx/_decoders.py#L258.


Is there no good JS lib that does this already?

Not that I know, maybe @deekshas8 you have seen something like this? I mean I suppose OpenAI implement this due to some good reason.

MatKuhr · 2024-11-04T13:36:33Z

...l/data/foundation-models/azure-openai-chat-completion-stream-chunk-response-token-usage.json

@@ -0,0 +1 @@
+{"choices":[],"created":1730125149,"id":"chatcmpl-ANKsHIdjvozwuOGpGI6rygvwSJH0I","model":"gpt-35-turbo","object":"chat.completion.chunk","system_fingerprint":"fp_808245b034","usage":{"completion_tokens":7,"prompt_tokens":14,"total_tokens":21}}


We can also get an error during streaming. See here for an example.

We should probably add a test for this case, ensuring we throw in such cases..

MatKuhr · 2024-11-04T13:43:06Z

packages/foundation-models/src/azure-openai/azure-openai-chat-completion-stream-response.ts

+    this._usage = usage;
+  }
+
+  public get finishReason(): string {


We could consider:

Suggested change

public get finishReason(): string {

public get finishReason(index = 0): string {

Technically, the finish reason might differ for different indices. But I doubt this is a frequent use case and we could always add this later, as this would be a none-breaking change, right? If so, I'd leave it out for now..

ZhongpinWang added 14 commits October 21, 2024 11:26

debug code

ea0c687

Make streaming work

5d0985f

fix: remove await

a74cf6b

fix: await again

9025451

small changes

fc12de0

chore: add missing javadoc

d5d38bd

Merge branch 'feat-streaming-azure-openai-playground' into feat-strea…

1f466a7

…ming-azure-openai

wip

07dda35

feat: pipe streams

c218211

feat: wrap chunk to see usage and finish reason

4ebd37d

refactor: pipe streams

02d1939

refactor

7a00fc3

refactor: change streamString to streamContent

dd02651

fix: lint

50142e2

ZhongpinWang requested review from marikaner, KavithaSiva, MatKuhr and deekshas8 October 25, 2024 11:46

ZhongpinWang and others added 9 commits October 25, 2024 17:38

refactor

c8611d8

feat: demo streaming in sample-code

7386dc5

fix: end res in sample code when finish

a7ec23a

Merge branch 'main' into feat-streaming-azure-openai

2a1d3be

fix: lint

bc03fed

refactor

c399f09

fix: check public-api

b3f4e71

chore: add tests for stream chunk response

fa91209

fix: Changes from lint

56e6197

MatKuhr reviewed Oct 29, 2024

View reviewed changes

packages/foundation-models/src/azure-openai/azure-openai-chat-completion-stream.ts Outdated Show resolved Hide resolved

ZhongpinWang added 2 commits October 29, 2024 23:06

fix: chunk type inference

6297626

refactor: change some types

f22bed7

ZhongpinWang added 2 commits October 30, 2024 10:47

wip

1348b97

fix: internal.js.map issue

8086b70

deekshas8 reviewed Oct 30, 2024

View reviewed changes

packages/foundation-models/src/azure-openai/azure-openai-line-decoder.ts Outdated Show resolved Hide resolved

deekshas8 requested changes Oct 30, 2024

View reviewed changes

ZhongpinWang added 4 commits October 30, 2024 13:15

chore: add tests for chat completion stream

40ad3d2

refactor: move stream files

dcb6d54

fix: remove duplicated file

4bde96b

refactor: rename stream

3d5554c

deekshas8 reviewed Oct 30, 2024

View reviewed changes

ZhongpinWang added 2 commits October 30, 2024 13:50

refactor: openai stream

0b79c66

chore: add tests for sse-stream (copied from openai)

7104fc5

deekshas8 reviewed Oct 30, 2024

View reviewed changes

...oundation-models/src/azure-openai/azure-openai-chat-completion-stream-chunk-response.test.ts Show resolved Hide resolved

ZhongpinWang removed the request for review from marikaner November 4, 2024 09:49

ZhongpinWang added 2 commits November 4, 2024 10:52

refactor: rename test responses

2c5247a

Merge branch 'main' into feat-streaming-azure-openai

3ff4c9e

MatKuhr reviewed Nov 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Streaming azure openai #244

feat: Streaming azure openai #244

ZhongpinWang commented Oct 25, 2024 •

edited

Loading

deekshas8 left a comment

deekshas8 Oct 30, 2024

deekshas8 Oct 30, 2024

ZhongpinWang Nov 4, 2024 •

edited

Loading

deekshas8 Nov 4, 2024 •

edited

Loading

MatKuhr Nov 4, 2024

ZhongpinWang Nov 8, 2024

MatKuhr Nov 4, 2024

MatKuhr Nov 4, 2024

		@@ -0,0 +1 @@
		{"choices":[],"created":1730125149,"id":"chatcmpl-ANKsHIdjvozwuOGpGI6rygvwSJH0I","model":"gpt-35-turbo","object":"chat.completion.chunk","system_fingerprint":"fp_808245b034","usage":{"completion_tokens":7,"prompt_tokens":14,"total_tokens":21}}

	public get finishReason(): string {
	public get finishReason(index = 0): string {

feat: Streaming azure openai #244

Are you sure you want to change the base?

feat: Streaming azure openai #244

Conversation

ZhongpinWang commented Oct 25, 2024 • edited Loading

Context

Definition of Done

deekshas8 left a comment

Choose a reason for hiding this comment

deekshas8 Oct 30, 2024

Choose a reason for hiding this comment

deekshas8 Oct 30, 2024

Choose a reason for hiding this comment

ZhongpinWang Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

deekshas8 Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

MatKuhr Nov 4, 2024

Choose a reason for hiding this comment

ZhongpinWang Nov 8, 2024

Choose a reason for hiding this comment

MatKuhr Nov 4, 2024

Choose a reason for hiding this comment

MatKuhr Nov 4, 2024

Choose a reason for hiding this comment

ZhongpinWang commented Oct 25, 2024 •

edited

Loading

ZhongpinWang Nov 4, 2024 •

edited

Loading

deekshas8 Nov 4, 2024 •

edited

Loading