-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paper claims there are 10-choices but the test split has varying number of choices (anywhere from 3 to 10) #24
Comments
Hi there, we think our paper said that it's augmented to 10 options. Then our strict human and machine quality checker will remove the low-quality options. So 17% of them for impacted. |
Do you plan to remove those 17% from HF-hub or you plan to augment them somehow to get 10 choices? We have an ongoing discussion about it here https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/947#66f3e0bae2e1cb781da1c769 |
I see. A simple solution would be just padding "N/A" options to the 17% questions so that each question physically reaches 10 options. Would that approach work for the normalization strategy? |
Unfortunately no, because the idea of normalization is to subtract the random baseline accuracy first, and then to rescale it back to 0-100 (more details here: https://huggingface.co/spaces/open-llm-leaderboard/blog). |
I have read the blog. I still think that padding the 17% questions with "N/A" options to 10 options should work. Would you mind pointing out what's the issue here? |
Hi folks, thanks for creating the dataset.
In your paper and the dataset card, you claim that MMLU-PRO has 10 choices for each question which seems to be false.
By opening the Viewer tab, and selecting
test
split one can see that only 83% of questions have 10 choices, and the remaining ones have anywhere from 3 to 10.What is happening here?
The text was updated successfully, but these errors were encountered: