Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multimodal] llava-1.5-7b-hf doesn't work on mmmu_val #2360

Open
BabyChouSr opened this issue Sep 26, 2024 · 4 comments
Open

[multimodal] llava-1.5-7b-hf doesn't work on mmmu_val #2360

BabyChouSr opened this issue Sep 26, 2024 · 4 comments
Labels
bug Something isn't working.

Comments

@BabyChouSr
Copy link

BabyChouSr commented Sep 26, 2024

Reproduction:

lm_eval --model hf-multimodal \
    --model_args pretrained=llava-hf/llava-1.5-7b-hf,max_images=1 \
    --tasks mmmu_val \
    --device cuda:0 \
    --batch_size 8

Error:

File "/root/.venv/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
             ^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/__main__.py", line 382, in cli_evaluate
    results = evaluator.simple_evaluate(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/evaluator.py", line 301, in simple_evaluate
    results = evaluate(
              ^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/evaluator.py", line 496, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/models/hf_vlms.py", line 674, in generate_until
    inputs = self.tok_batch_multimodal_encode(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/models/hf_vlms.py", line 296, in tok_batch_multimodal_encode
    encoding = self.processor(
               ^^^^^^^^^^^^^^^
  File "/workspace/transformers/src/transformers/models/llava/processing_llava.py", line 134, in __call__
    image_inputs = self.image_processor(images, **output_kwargs["images_kwargs"])
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/transformers/src/transformers/image_processing_utils.py", line 41, in __call__
    return self.preprocess(images, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/transformers/src/transformers/models/clip/image_processing_clip.py", line 286, in preprocess
    images = make_list_of_images(images)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/transformers/src/transformers/image_utils.py", line 205, in make_list_of_images
    raise ValueError(Invalid image type. Expected either PIL.Image.Image, numpy.ndarray, torch.Tensor, tf.Tensor or jax.ndarray, but got <class 'list'>.

I tried using vllm but they have also have an issue with the number of image tokens = 4 * 576 = 2304 != the number of image placeholders being 2305.

@haileyschoelkopf haileyschoelkopf added the bug Something isn't working. label Sep 26, 2024
@haileyschoelkopf
Copy link
Collaborator

Hi! We'll take a look at this. If I recall correctly this is due to an inconsistency in the input formats for this model as compared to other HF AutoModelForVision2Seq models and their corresponding processors.

@BabyChouSr
Copy link
Author

thanks for the quick reply! it doesn't seem to be just llava-v1.5-7b however. I have some issues with Idefics2-8b as well.

Versions:

transformers==4.45.1

Command:

lm_eval --model hf-multimodal \
    --model_args pretrained=HuggingFaceM4/idefics2-8b,max_images=2,attn_implementation=flash_attention_2,dtype=bfloat16,convert_img_format=True \
    --tasks mmmu_val \
    --device cuda:0 \
    --batch_size 2

Traceback:

Traceback (most recent call last):
  File "/root/.venv/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
             ^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/__main__.py", line 382, in cli_evaluate
    results = evaluator.simple_evaluate(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/evaluator.py", line 301, in simple_evaluate
    results = evaluate(
              ^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/evaluator.py", line 496, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/models/hf_vlms.py", line 686, in generate_until
    cont = self._model_multimodal_generate(inputs, stop=until, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/models/hf_vlms.py", line 342, in _model_multimodal_generate
    return self.model.generate(
           ^^^^^^^^^^^^^^^^^^^^
  File "/root/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2048, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/root/.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3008, in _sample
    outputs = self(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.venv/lib/python3.12/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 1603, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/root/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.venv/lib/python3.12/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 1419, in forward
    inputs_embeds = self.inputs_merger(
                    ^^^^^^^^^^^^^^^^^^^
  File "/root/.venv/lib/python3.12/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 1296, in inputs_merger
    new_inputs_embeds[special_image_token_mask] = reshaped_image_hidden_states
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape mismatch: value tensor of shape [640, 4096] cannot be broadcast to indexing result of shape [0, 4096]

@BabyChouSr
Copy link
Author

I tried vllm and somewhere I think there is an additional image token getting added in the context. When running

lm_eval --model vllm-vlm \
    --model_args pretrained=llava-hf/llava-1.5-7b-hf,max_images=1 \
    --tasks mmmu_val_architecture_and_engineering \
    --device cuda:0 \
    --batch_size 1

I noticed that inputs[7] has 2 tokens in them even though i set max image to 1. I'm not that familiar with the code base so I'm not sure where the image tokens are being set, but I hope this helps out.

@haileyschoelkopf
Copy link
Collaborator

haileyschoelkopf commented Sep 27, 2024

Thanks @BabyChouSr , this is helpful-- in our testing we found idefics2 would run and avoid this error if setting max_images=2, so that error is surprising to me :( Haven't yet traced back the root cause.

( @baberabb also making you aware of this thread in case you hadn't seen it!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

2 participants