-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Streaming azure openai #244
base: main
Are you sure you want to change the base?
Conversation
…ming-azure-openai
packages/foundation-models/src/azure-openai/azure-openai-chat-completion-stream.ts
Outdated
Show resolved
Hide resolved
packages/foundation-models/src/azure-openai/azure-openai-line-decoder.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review still WIP
packages/foundation-models/src/azure-openai/azure-openai-stream.ts
Outdated
Show resolved
Hide resolved
packages/foundation-models/src/azure-openai/azure-openai-stream.ts
Outdated
Show resolved
Hide resolved
packages/foundation-models/src/azure-openai/azure-openai-stream.ts
Outdated
Show resolved
Hide resolved
packages/foundation-models/src/azure-openai/azure-openai-stream.ts
Outdated
Show resolved
Hide resolved
packages/foundation-models/src/azure-openai/azure-openai-stream.ts
Outdated
Show resolved
Hide resolved
...oundation-models/src/azure-openai/azure-openai-chat-completion-stream-chunk-response.test.ts
Outdated
Show resolved
Hide resolved
* Azure OpenAI chat completion stream response. | ||
*/ | ||
export class AzureOpenAiChatCompletionStreamResponse<T> { | ||
private _usage: AzureOpenAiCompletionUsage | undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[q] Aren't these properties duplicated (I see the're part of the chunk response type as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have strong reservations about introducing the additional wrapper AzureOpenAiChatCompletionStreamResponse
with dup properties.
I my view, we should keep the api response simple, so just a stream response of chunks (without the wrapper). streamContent
is already simplified/additional convenience.
If the user wants to access finish_reason
or usage
, we already provide the util functions in the AzureOpenAiChatCompletionStreamChunkResponse
.
Given this, the added complexity from AzureOpenAiChatCompletionStreamResponse
and the multiple *process
functions does not seem worth it to me.
@MatKuhr @tomfrenken @KavithaSiva I would be interested in your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this feedback. There are some thoughts behind introducing the wrapper.
- Adding this wrapper or not, the difference for user to access the chunk is just either using
for await (const chunk of response.stream)
orfor await (const chunk of response)
. This won't make the consumption more difficult. - The original reason for adding this is that we want to support
streamContent()
. I discussed this with @MatKuhr. In the current Java implementation, string stream is calledstreamChatCompletion()
and has a very important role by default. When user calls this method, they do not have direct access to the original raw chunk returned by OpenAI, i.e., user has no access tofinish_reason
or especiallytoken_usage
. In Java, when user wants this information, they need to either consume the original stream by themselves, or forfinish_reason
likecontent_filter
, they throw exceptions (in a separate thread) so that user will get notified. But we don't wanna throw errors in JS as this is inconsistent with our non-streaming api and will also stop streaming immediately. That is why I added this outside wrapper for the whole streaming, to let user at least have access tofinish_reason
andtoken_usage
when usingstreamContent()
. - Having
AzureOpenAiChatCompletionStreamResponse
or not, we need these*process
functions for other things like wrapping the original OpenAI chunk with our util wrapperAzureOpenAiChatCompletionStreamChunkResponse
or transform further into string stream. It is not only used fortoken_usage
orfinish_reason
. - There will be no error handling from us for different
finish_reason
, like when content filter stopped the stream, SDK will never know. This means each and every user literally need to implement themselves a big switch case, check for each chunk response and do error handling by themselves. One could argue that we have some util function implemented and expose them to user for checkingfinish_reason
, but user has to know to use this function. When we know these checks are necessary and can do this in-place inside our SDK, why let user do this. And do notice that it is not even possible when usingstreamContent()
.
I hope this can clarify a bit the motivation behind and let me know if you see it differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining the thought behind the wrapper class.
As discussed, I would suggest we don't really need a separate streamContent() api, but rather provide a convenience method in the stream response object returned by .stream()
. The user can call it if they need it.
async *processContent(
this: AzureOpenAiChatCompletionStream<AzureOpenAiChatCompletionStreamChunkResponse>
): AsyncGenerator<string> {
...
}
The above change will allow calling it on stream object.
for await (const chunk of response.stream.processContent())
Also maybe renaming it to get instead of process?
...oundation-models/src/azure-openai/azure-openai-chat-completion-stream-chunk-response.test.ts
Show resolved
Hide resolved
* A re-implementation of httpx's `LineDecoder` in Python that handles incrementally | ||
* reading lines from text. | ||
* | ||
* Https://github.com/encode/httpx/blob/920333ea98118e9cf617f246905d7b202510941c/httpx/_decoders.py#L258. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no good JS lib that does this already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I know, maybe @deekshas8 you have seen something like this? I mean I suppose OpenAI implement this due to some good reason.
@@ -0,0 +1 @@ | |||
{"choices":[],"created":1730125149,"id":"chatcmpl-ANKsHIdjvozwuOGpGI6rygvwSJH0I","model":"gpt-35-turbo","object":"chat.completion.chunk","system_fingerprint":"fp_808245b034","usage":{"completion_tokens":7,"prompt_tokens":14,"total_tokens":21}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also get an error during streaming. See here for an example.
We should probably add a test for this case, ensuring we throw in such cases..
this._usage = usage; | ||
} | ||
|
||
public get finishReason(): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider:
public get finishReason(): string { | |
public get finishReason(index = 0): string { |
Technically, the finish reason might differ for different indices. But I doubt this is a frequent use case and we could always add this later, as this would be a none-breaking change, right? If so, I'd leave it out for now..
Context
AI/gen-ai-hub-sdk-js-backlog#56.
Support streaming for
azure-openai
client infoundation-models
.Check
sample-code
for usage.For reviewer: PR is still WIP, tests are missing, linter is failing... But streaming already works. I would appreciate your opinion on the API design, overall structure, missing functionality etc. for further improvement.
Definition of Done