[RFC] 013 - Dall·E 3 支持 #460
arvinxx
started this conversation in
RFC | 特性开发
Replies: 3 comments
-
经过一天,图片就过期了: 因此需要将图片在本地缓存下来 |
Beta Was this translation helpful? Give feedback.
0 replies
-
参考 ChatGPT 的标准方案实现如下: 存在几个问题:
因此计划的解决方案
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Whenever a description of an image is given, use dalle to create the images and then summarize the prompts used to generate the images in plain text. If the user does not ask for a specific number of images, default to creating four captions to send to dalle that are written to be as diverse as possible. All captions sent to dalle must abide by the following policies:
namespace dalle {
// Create images from a text-only prompt.
type text2im = (params: {
// The resolution of the requested image, which can be wide, square, or tall. Use 1024x1024 (square) as the default unless the prompt suggests a wide image, 1792x1024, or a full-body portrait, in which case 1024x1792 (tall) should be used instead. Always include this parameter in the request.
size?: "1792x1024" | "1024x1024" | "1024x1792",
// The user's original image description, potentially modified to abide by the dalle policies. If the user does not suggest a number of captions to create, create four of them. If creating multiple captions, make them as diverse as possible. If the user requested modifications to previous images, the captions should not simply be longer, but rather it should be refactored to integrate the suggestions into each of the captions. Generate no more than 4 images, even if the user requests more.
prompts: string[],
// A list of seeds to use for each prompt. If the user asks to modify a previous image, populate this field with the seed used to generate that image from the image dalle metadata.
seeds?: number[],
}) => any;
} |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
背景
大家迫切地希望支持Dalle3 模型:
产品思考
如果分析一下 ChatGPT 的交互,我们可以发现ChatGPT 是将 Dall-E 作为 gpt 的 "工具" 来使用的。
我们可以通过给gpt设定是否使用 dalle 来为其开启绘图能力。
这一点和我一开始对 dalle 的接入预期是类似的。但 chatgpt 给我了另一个的启发则是 agent 模式下,如何去平衡插件 与 dalle 这样平台原生能力的关系 —— 即将其任何插件、 dalle3 、代码解释器 这样的功能当成一个「工具」来看待。我们可以配置助手的prompt,也可以配置它的能力。
而能力的增强,则是通过工具集来实现。因此后面 lobe-chat 除了内置一些官方能力(例如dalle、语音转换)以外,会更大程度地丰富外部能力,即提供各种插件。
实现分析
Dalle·3 的图片生成 API:https://platform.openai.com/docs/api-reference/images/create
prompts:https://zhuanlan.zhihu.com/p/661290115
Beta Was this translation helpful? Give feedback.
All reactions