LLM.aask支持流式响应内容外部访问 #1389

htSun1998 · 2024-07-11T09:06:54Z

Features

针对问题 #1301 (comment)
openai_llm做了流式输出的一些改造，使得LLM.aask可以将大模型的流式输出实时存入queue，方便在http接口中实现流式输出

使用openai接口的LLM.aask现在可以传入参数queue，用于临时存储llm流式响应

Feature Docs

示例代码放在examples/fastapi_stream.py

Influence

使用队列存储llm流式响应，使其可以被外部访问

Result

2. 增加示例代码 examples/fastapi_stream.py 3. requirements增加fastapi

shenchucheng · 2024-07-16T03:01:47Z

您好，非常感谢您的提交。不过log_llm_stream是可以满足您的需求的

import asyncio
import contextlib
import threading
import queue as sync_queue
from fastapi import FastAPI
from sse_starlette.sse import EventSourceResponse
import uvicorn
from metagpt.llm import LLM
from metagpt.logs import set_llm_stream_logfunc
from contextvars import ContextVar

QUEUE_CONTEXTVAR: ContextVar[sync_queue.Queue] = ContextVar("queue")


def send_llm_stream(msg):
    queue = QUEUE_CONTEXTVAR.get()
    queue.put(msg)

app = FastAPI()
llm = LLM()
async def generate_async_stream(queue):
    QUEUE_CONTEXTVAR.set(queue)
    requirement = "解决这个数学问题：正整数m和n的最大公约数是6。m和n的最小公倍数是126。m + n的最小可能值是多少？"
    await llm.aask(requirement, stream=True)
    queue.put(None)

@app.get("/stream")
def stream():
    queue = sync_queue.Queue()
    print(queue)

    def run_loop(queue):
        asyncio.set_event_loop(asyncio.new_event_loop())
        loop = asyncio.get_event_loop()
        loop.run_until_complete(generate_async_stream(queue))

    thread = threading.Thread(target=run_loop, args=(queue,))
    thread.start()

    def generate_sync_stream():
        while True:
            message = queue.get()
            print(message)
            if message is None:
                break
            yield f"data: {message}\n\n"

    return EventSourceResponse(generate_sync_stream(), media_type='text/event-stream')


if __name__ == "__main__":
    set_llm_stream_logfunc(send_llm_stream)
    
    uvicorn.run(app=app,
                host="0.0.0.0",
                port=3000)

geekan · 2024-07-16T03:48:34Z

我觉得可能要讨论一下，这两者哪个更好？

htSun1998 · 2024-07-16T09:13:08Z

谢谢！

geekan · 2024-07-18T07:40:48Z

意大利面还是 callback？

geekan · 2024-10-20T06:45:09Z

After a small amount of discussion, since both solutions have independent problems, the existing solution will be used here. If you have any other discussions you would like to initiate, please feel free to do so at any time

htSun1998 added 3 commits July 11, 2024 16:27

针对openai_llm做了流式输出的一些改造，使得LLM.aask可以将大模型的流式输出实时存入queue，方便在http接口中实现流式输出

6674fe4

1. 恢复多余的修改

54c4542

2. 增加示例代码 examples/fastapi_stream.py 3. requirements增加fastapi

requirements

03f5f00

htSun1998 had a problem deploying to unittest July 11, 2024 09:06 — with GitHub Actions Failure

geekan requested a review from shenchucheng July 15, 2024 08:05

geekan closed this Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM.aask支持流式响应内容外部访问 #1389

LLM.aask支持流式响应内容外部访问 #1389

htSun1998 commented Jul 11, 2024

shenchucheng commented Jul 16, 2024

geekan commented Jul 16, 2024

htSun1998 commented Jul 16, 2024

geekan commented Jul 18, 2024

geekan commented Oct 20, 2024

LLM.aask支持流式响应内容外部访问 #1389

LLM.aask支持流式响应内容外部访问 #1389

Conversation

htSun1998 commented Jul 11, 2024

shenchucheng commented Jul 16, 2024

geekan commented Jul 16, 2024

htSun1998 commented Jul 16, 2024

geekan commented Jul 18, 2024

geekan commented Oct 20, 2024