Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Streaming azure openai #244

Draft
wants to merge 35 commits into
base: main
Choose a base branch
from
Draft

Conversation

ZhongpinWang
Copy link
Contributor

@ZhongpinWang ZhongpinWang commented Oct 25, 2024

Context

AI/gen-ai-hub-sdk-js-backlog#56.

Support streaming for azure-openai client in foundation-models.

Check sample-code for usage.

For reviewer: PR is still WIP, tests are missing, linter is failing... But streaming already works. I would appreciate your opinion on the API design, overall structure, missing functionality etc. for further improvement.

Definition of Done

  • Code is tested (Unit, E2E)
  • Error handling created / updated & covered by the tests above
  • Documentation updated
  • (Optional) Aligned changes with the Java SDK
  • (Optional) Release notes updated

Copy link
Contributor

@deekshas8 deekshas8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review still WIP

* Azure OpenAI chat completion stream response.
*/
export class AzureOpenAiChatCompletionStreamResponse<T> {
private _usage: AzureOpenAiCompletionUsage | undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[q] Aren't these properties duplicated (I see the're part of the chunk response type as well)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have strong reservations about introducing the additional wrapper AzureOpenAiChatCompletionStreamResponse with dup properties.
I my view, we should keep the api response simple, so just a stream response of chunks (without the wrapper). streamContent is already simplified/additional convenience.

If the user wants to access finish_reason or usage, we already provide the util functions in the AzureOpenAiChatCompletionStreamChunkResponse.
Given this, the added complexity from AzureOpenAiChatCompletionStreamResponse and the multiple *process functions does not seem worth it to me.

@MatKuhr @tomfrenken @KavithaSiva I would be interested in your thoughts

Copy link
Contributor Author

@ZhongpinWang ZhongpinWang Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this feedback. There are some thoughts behind introducing the wrapper.

  1. Adding this wrapper or not, the difference for user to access the chunk is just either using for await (const chunk of response.stream) or for await (const chunk of response). This won't make the consumption more difficult.
  2. The original reason for adding this is that we want to support streamContent(). I discussed this with @MatKuhr. In the current Java implementation, string stream is called streamChatCompletion() and has a very important role by default. When user calls this method, they do not have direct access to the original raw chunk returned by OpenAI, i.e., user has no access to finish_reason or especially token_usage. In Java, when user wants this information, they need to either consume the original stream by themselves, or for finish_reason like content_filter, they throw exceptions (in a separate thread) so that user will get notified. But we don't wanna throw errors in JS as this is inconsistent with our non-streaming api and will also stop streaming immediately. That is why I added this outside wrapper for the whole streaming, to let user at least have access to finish_reason and token_usage when using streamContent().
  3. Having AzureOpenAiChatCompletionStreamResponse or not, we need these *process functions for other things like wrapping the original OpenAI chunk with our util wrapper AzureOpenAiChatCompletionStreamChunkResponse or transform further into string stream. It is not only used for token_usage or finish_reason.
  4. There will be no error handling from us for different finish_reason, like when content filter stopped the stream, SDK will never know. This means each and every user literally need to implement themselves a big switch case, check for each chunk response and do error handling by themselves. One could argue that we have some util function implemented and expose them to user for checking finish_reason, but user has to know to use this function. When we know these checks are necessary and can do this in-place inside our SDK, why let user do this. And do notice that it is not even possible when using streamContent().

I hope this can clarify a bit the motivation behind and let me know if you see it differently.

Copy link
Contributor

@deekshas8 deekshas8 Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining the thought behind the wrapper class.

As discussed, I would suggest we don't really need a separate streamContent() api, but rather provide a convenience method in the stream response object returned by .stream(). The user can call it if they need it.

  async *processContent(
    this: AzureOpenAiChatCompletionStream<AzureOpenAiChatCompletionStreamChunkResponse>
  ): AsyncGenerator<string> {
   ...
  }

The above change will allow calling it on stream object.

for await (const chunk of response.stream.processContent())

Also maybe renaming it to get instead of process?

@ZhongpinWang ZhongpinWang removed the request for review from marikaner November 4, 2024 09:49
* A re-implementation of httpx's `LineDecoder` in Python that handles incrementally
* reading lines from text.
*
* Https://github.com/encode/httpx/blob/920333ea98118e9cf617f246905d7b202510941c/httpx/_decoders.py#L258.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no good JS lib that does this already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know, maybe @deekshas8 you have seen something like this? I mean I suppose OpenAI implement this due to some good reason.

@@ -0,0 +1 @@
{"choices":[],"created":1730125149,"id":"chatcmpl-ANKsHIdjvozwuOGpGI6rygvwSJH0I","model":"gpt-35-turbo","object":"chat.completion.chunk","system_fingerprint":"fp_808245b034","usage":{"completion_tokens":7,"prompt_tokens":14,"total_tokens":21}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also get an error during streaming. See here for an example.

We should probably add a test for this case, ensuring we throw in such cases..

this._usage = usage;
}

public get finishReason(): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider:

Suggested change
public get finishReason(): string {
public get finishReason(index = 0): string {

Technically, the finish reason might differ for different indices. But I doubt this is a frequent use case and we could always add this later, as this would be a none-breaking change, right? If so, I'd leave it out for now..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants