Allow input of image properties #1337

zhugeyicixin · 2024-06-12T03:59:34Z

Features
Allow image input as a dict[str, str] or a list of dict[str, str] so that image properties, such as detail (https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding), can be passed through. The images argument in llm.aask() can now be any of these:

None
str of image url or path (original feature)
list of str above (original feature)
dict as {'url': str of image url or path, 'detail': 'high' | 'low' | 'auto'} (new feature)
list of dict above (new feature)

Influence
The original usage is not affected. The new usage of passing image details is supported.

Result

import asyncio

from metagpt.llm import LLM
from metagpt.logs import logger


async def ask_w_image(
    question,
    images,
    llm,
    system_prompt,
) -> str:
    logger.info(f"Q: {question}")
    logger.info(f"image: {images}")
    rsp = await llm.aask(
        question,
        system_msgs=[system_prompt],
        images=images,
    )
    logger.info(f"A: {rsp}")
    return rsp


async def main():
    llm = LLM()
    await ask_w_image(
        question="Describe this image",
        images="https://raw.githubusercontent.com/geekan/MetaGPT/main/docs/resources/MetaGPT-new-log.png",
        llm=llm,
        system_prompt="You are a helpful AI assistant.",
    )
    await ask_w_image(
        question="Describe this image",
        images={
            "url": "https://raw.githubusercontent.com/geekan/MetaGPT/main/docs/resources/MetaGPT-new-log.png"
        },
        llm=llm,
        system_prompt="You are a helpful AI assistant.",
    )
    await ask_w_image(
        question="Describe this image",
        images={
            "url": "https://raw.githubusercontent.com/geekan/MetaGPT/main/docs/resources/MetaGPT-new-log.png",
            "detail": "low",
        },
        llm=llm,
        system_prompt="You are a helpful AI assistant.",
    )
    await ask_w_image(
        question="Describe this image",
        images={
            "url": "https://raw.githubusercontent.com/geekan/MetaGPT/main/docs/resources/MetaGPT-new-log.png",
            "detail": "high",
        },
        llm=llm,
        system_prompt="You are a helpful AI assistant.",
    )


if __name__ == "__main__":
    asyncio.run(main())


# output

2024-06-11 20:34:26.244 | INFO     | __main__:ask_w_image:20 - Q: Describe this image
2024-06-11 20:34:26.245 | INFO     | __main__:ask_w_image:21 - image: https://raw.githubusercontent.com/geekan/MetaGPT/main/docs/resources/MetaGPT-new-log.png
The image depicts a black and white abstract logo consisting of four interconnected shapes that resemble stylized, curved arrows or loops. These shapes form a symmetrical pattern, creating a sense of movement and flow. The design is bold and modern, with clean lines and a balanced composition. The overall effect is dynamic and visually engaging.
2024-06-11 20:34:29.599 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.001 | Max budget: $10.000 | Current cost: $0.001, prompt_tokens: 21, completion_tokens: 63
2024-06-11 20:34:29.599 | INFO     | __main__:ask_w_image:27 - A: The image depicts a black and white abstract logo consisting of four interconnected shapes that resemble stylized, curved arrows or loops. These shapes form a symmetrical pattern, creating a sense of movement and flow. The design is bold and modern, with clean lines and a balanced composition. The overall effect is dynamic and visually engaging.
2024-06-11 20:34:29.599 | INFO     | __main__:ask_w_image:20 - Q: Describe this image
2024-06-11 20:34:29.599 | INFO     | __main__:ask_w_image:21 - image: {'url': 'https://raw.githubusercontent.com/geekan/MetaGPT/main/docs/resources/MetaGPT-new-log.png'}
The image depicts a black and white abstract logo or symbol. It consists of four identical, curved shapes that are arranged in a circular pattern, creating a sense of motion and symmetry. Each shape has a pointed end and a rounded end, and they are interconnected in a way that forms a continuous loop. The overall design is dynamic and balanced, with a modern and sleek appearance. The use of black and white adds to the simplicity and elegance of the symbol.
2024-06-11 20:34:34.029 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.003 | Max budget: $10.000 | Current cost: $0.001, prompt_tokens: 21, completion_tokens: 91
2024-06-11 20:34:34.030 | INFO     | __main__:ask_w_image:27 - A: The image depicts a black and white abstract logo or symbol. It consists of four identical, curved shapes that are arranged in a circular pattern, creating a sense of motion and symmetry. Each shape has a pointed end and a rounded end, and they are interconnected in a way that forms a continuous loop. The overall design is dynamic and balanced, with a modern and sleek appearance. The use of black and white adds to the simplicity and elegance of the symbol.
2024-06-11 20:34:34.030 | INFO     | __main__:ask_w_image:20 - Q: Describe this image
2024-06-11 20:34:34.030 | INFO     | __main__:ask_w_image:21 - image: {'url': 'https://raw.githubusercontent.com/geekan/MetaGPT/main/docs/resources/MetaGPT-new-log.png', 'detail': 'low'}
The image features a black and white abstract design composed of four interlocking shapes. Each shape resembles a curved, elongated teardrop or a stylized leaf, and they are arranged in a circular pattern, creating a symmetrical and balanced appearance. The shapes are connected at their tips, forming a continuous loop that gives the design a sense of unity and flow. The overall effect is visually striking and harmonious.
2024-06-11 20:34:37.175 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.004 | Max budget: $10.000 | Current cost: $0.001, prompt_tokens: 21, completion_tokens: 80
2024-06-11 20:34:37.175 | INFO     | __main__:ask_w_image:27 - A: The image features a black and white abstract design composed of four interlocking shapes. Each shape resembles a curved, elongated teardrop or a stylized leaf, and they are arranged in a circular pattern, creating a symmetrical and balanced appearance. The shapes are connected at their tips, forming a continuous loop that gives the design a sense of unity and flow. The overall effect is visually striking and harmonious.
2024-06-11 20:34:37.175 | INFO     | __main__:ask_w_image:20 - Q: Describe this image
2024-06-11 20:34:37.175 | INFO     | __main__:ask_w_image:21 - image: {'url': 'https://raw.githubusercontent.com/geekan/MetaGPT/main/docs/resources/MetaGPT-new-log.png', 'detail': 'high'}
The image depicts a black and white abstract logo or symbol. It consists of four identical, curved shapes that are arranged in a circular pattern, creating a sense of motion and symmetry. Each shape has a pointed end and a rounded end, and they are interconnected in a way that forms a continuous loop. The overall design is dynamic and balanced, with a modern and sleek appearance. The use of black and white adds to the simplicity and elegance of the symbol.
2024-06-11 20:34:40.956 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.005 | Max budget: $10.000 | Current cost: $0.001, prompt_tokens: 21, completion_tokens: 91
2024-06-11 20:34:40.957 | INFO     | __main__:ask_w_image:27 - A: The image depicts a black and white abstract logo or symbol. It consists of four identical, curved shapes that are arranged in a circular pattern, creating a sense of motion and symmetry. Each shape has a pointed end and a rounded end, and they are interconnected in a way that forms a continuous loop. The overall design is dynamic and balanced, with a modern and sleek appearance. The use of black and white adds to the simplicity and elegance of the symbol.

Allow image input as a dict[str, str] or a list of dict[str, str] so that image properties, such as `detail` (https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding), can be passed through.

better629 · 2024-07-15T11:57:10Z

metagpt/provider/base_llm.py

            # image url or image base64
-            url = image if image.startswith("http") else f"data:image/jpeg;base64,{image}"
+            if not image_info["url"].startswith("http"):


doesn't check the image_info field name if passing a dict without url. maybe you should add validation.

codecov-commenter · 2024-08-28T22:34:48Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 18.18182% with 9 lines in your changes missing coverage. Please review.

Project coverage is 62.56%. Comparing base (ab846f6) to head (11221e9).

Files with missing lines	Patch %	Lines
metagpt/provider/base_llm.py	18.18%	9 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1337      +/-   ##
==========================================
- Coverage   62.59%   62.56%   -0.03%     
==========================================
  Files         287      287              
  Lines       17589    17595       +6     
==========================================
  Hits        11009    11009              
- Misses       6580     6586       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Allow input of image properties

63327c9

Allow image input as a dict[str, str] or a list of dict[str, str] so that image properties, such as `detail` (https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding), can be passed through.

zhugeyicixin had a problem deploying to unittest June 12, 2024 03:59 — with GitHub Actions Failure

geekan requested a review from better629 July 15, 2024 07:55

better629 reviewed Jul 15, 2024

View reviewed changes

Merge branch 'geekan:main' into main

11221e9

zhugeyicixin requested a deployment to unittest August 28, 2024 22:18 — with GitHub Actions Waiting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow input of image properties #1337

Allow input of image properties #1337

zhugeyicixin commented Jun 12, 2024

better629 Jul 15, 2024

codecov-commenter commented Aug 28, 2024

Allow input of image properties #1337

Are you sure you want to change the base?

Allow input of image properties #1337

Conversation

zhugeyicixin commented Jun 12, 2024

better629 Jul 15, 2024

Choose a reason for hiding this comment

codecov-commenter commented Aug 28, 2024

Codecov Report