Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about the image understanding #25

Open
df2046df opened this issue Jul 17, 2024 · 6 comments
Open

question about the image understanding #25

df2046df opened this issue Jul 17, 2024 · 6 comments
Labels
inference Something about inference question Further information is requested

Comments

@df2046df
Copy link

Does this model support multiple image inputs?

@JoyBoy-Su
Copy link
Collaborator

Hi, thanks for your interest!
Anole can support multiple input images. You can do this by adjusting the structure of input.json and refer to the instruction to run. Here's an example:

[
    {
        "type": "image",
        "content": "image1.png"
    },
    {
        "type": "image",
        "content": "image2.png"
    },
    {
        "type": "text",
        "content": "your instruction"
    }
]

And it's important to note that the performance of Anole depends on the multiple image input task, and Anole may perform differently on different tasks.

@JoyBoy-Su JoyBoy-Su added question Further information is requested inference Something about inference labels Jul 17, 2024
@df2046df
Copy link
Author

Hi, thanks for your interest! Anole can support multiple input images. You can do this by adjusting the structure of input.json and refer to the instruction to run. Here's an example:

[
    {
        "type": "image",
        "content": "image1.png"
    },
    {
        "type": "image",
        "content": "image2.png"
    },
    {
        "type": "text",
        "content": "your instruction"
    }
]

And it's important to note that the performance of Anole depends on the multiple image input task, and Anole may perform differently on different tasks.

Thank you for your reply! But I have a problem when inputting multiple images: when the number of input images is greater than or equal to four, the following error will occur:

Traceback (most recent call last):
File "/opt/data/private/code/anole/inference.py", line 133, in
main(args)
File "/opt/data/private/code/anole/inference.py", line 107, in main
segments = split_token_sequence(tokens, boi, eoi)
File "/opt/data/private/code/anole/inference.py", line 32, in split_token_sequence
batch_size, _ = tokens.shape
ValueError: not enough values to unpack (expected 2, got 1)

I output the shape of tokens and found that the result is torch.Size([0]). What is the reason for this?

@JoyBoy-Su
Copy link
Collaborator

Hi, thanks for your interest! Anole can support multiple input images. You can do this by adjusting the structure of input.json and refer to the instruction to run. Here's an example:

[
    {
        "type": "image",
        "content": "image1.png"
    },
    {
        "type": "image",
        "content": "image2.png"
    },
    {
        "type": "text",
        "content": "your instruction"
    }
]

And it's important to note that the performance of Anole depends on the multiple image input task, and Anole may perform differently on different tasks.

Thank you for your reply! But I have a problem when inputting multiple images: when the number of input images is greater than or equal to four, the following error will occur:

Traceback (most recent call last): File "/opt/data/private/code/anole/inference.py", line 133, in main(args) File "/opt/data/private/code/anole/inference.py", line 107, in main segments = split_token_sequence(tokens, boi, eoi) File "/opt/data/private/code/anole/inference.py", line 32, in split_token_sequence batch_size, _ = tokens.shape ValueError: not enough values to unpack (expected 2, got 1)

I output the shape of tokens and found that the result is torch.Size([0]). What is the reason for this?

Probably because the default Anole context length is 4096 and the number of tokens per image is 1026 (1024 + boi + eoi), which makes the model not work properly when the number of input images is greater than or equal to 4.

@YiFang99
Copy link

Is the number of tokens per image a parameter that user can set or is it fixed?

@JoyBoy-Su
Copy link
Collaborator

Is the number of tokens per image a parameter that user can set or is it fixed?

I'm sorry it's fixed.

@df2046df
Copy link
Author

Hi, thanks for your interest! Anole can support multiple input images. You can do this by adjusting the structure of input.json and refer to the instruction to run. Here's an example:

[
    {
        "type": "image",
        "content": "image1.png"
    },
    {
        "type": "image",
        "content": "image2.png"
    },
    {
        "type": "text",
        "content": "your instruction"
    }
]

And it's important to note that the performance of Anole depends on the multiple image input task, and Anole may perform differently on different tasks.

Thank you for your reply! But I have a problem when inputting multiple images: when the number of input images is greater than or equal to four, the following error will occur:
Traceback (most recent call last): File "/opt/data/private/code/anole/inference.py", line 133, in main(args) File "/opt/data/private/code/anole/inference.py", line 107, in main segments = split_token_sequence(tokens, boi, eoi) File "/opt/data/private/code/anole/inference.py", line 32, in split_token_sequence batch_size, _ = tokens.shape ValueError: not enough values to unpack (expected 2, got 1)
I output the shape of tokens and found that the result is torch.Size([0]). What is the reason for this?

Probably because the default Anole context length is 4096 and the number of tokens per image is 1026 (1024 + boi + eoi), which makes the model not work properly when the number of input images is greater than or equal to 4.

I have another question. When I use the model for batch image understanding, the output is empty.
Snipaste_2024-07-18_17-02-18
What could be the reason for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inference Something about inference question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants