Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Prediction Result on Zero-Shot VQA Task #41

Open
zsun5 opened this issue Sep 24, 2024 · 1 comment
Open

Unexpected Prediction Result on Zero-Shot VQA Task #41

zsun5 opened this issue Sep 24, 2024 · 1 comment

Comments

@zsun5
Copy link

zsun5 commented Sep 24, 2024

Hello,

I am doing a zero-shot evaluation for the VQA task but got unexpected results. The results in _predict.json look like this:

{"question_id": "10", "answer": "no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no"}, {"question_id": "12", "answer": "no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no"}, {"question_id": "13", "answer": "yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes"}, {"question_id": "19", "answer": "<code_2640><code_5423><code_5423><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_279><code_4021><code_4021><code_4021><code_4021><code_4021><code_4021><code_4021><code_4021><code_4021><code_4021><code_4021><code_5151><code_5151><code_5151><code_5151><code_5151><code_5151><code_5151><code_5151><code_5151><code_5151><code_5151><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026><code_3026>"}

When I made zero-shot predictions on my own dataset, the final results were all in the format of <code_xxxx>.
Do you have any idea how I caused the issue?

Thank you very much.

@taokz
Copy link
Owner

taokz commented Sep 25, 2024

This is a common phenomenon in zero-shot settings due to the lack of diverse VQA (instruction-following data) during pretraining, which limits the model’s ability to understand human intent. The <code_xxxx> is the image code used for masked image infilling (a pretraining task), which means the model is mistakenly interpreting the question as a prompt for image infilling.

To address this issue, one option is to use the instruction-tuned checkpoints provided in this repository. Alternatively, I recommend fine-tuning the model for better performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants