Paper claims there are 10-choices but the test split has varying number of choices (anywhere from 3 to 10) #24

eldarkurtic · 2024-09-25T05:58:32Z

Hi folks, thanks for creating the dataset.
In your paper and the dataset card, you claim that MMLU-PRO has 10 choices for each question which seems to be false.
By opening the Viewer tab, and selecting test split one can see that only 83% of questions have 10 choices, and the remaining ones have anywhere from 3 to 10.
What is happening here?

The text was updated successfully, but these errors were encountered:

wenhuchen · 2024-09-25T14:07:59Z

Hi there, we think our paper said that it's augmented to 10 options. Then our strict human and machine quality checker will remove the low-quality options. So 17% of them for impacted.

eldarkurtic · 2024-09-25T14:12:46Z

Do you plan to remove those 17% from HF-hub or you plan to augment them somehow to get 10 choices?
I am asking this because the OpenLLM Leaderboard v2 is using MMLU-Pro as one of the tasks, and normalization of scores is impacted by this as it was implemented by following the claim that each question has 10 choices (so scores were normalized by 1/10).

We have an ongoing discussion about it here https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/947#66f3e0bae2e1cb781da1c769

wenhuchen · 2024-09-25T14:16:53Z

I see. A simple solution would be just padding "N/A" options to the 17% questions so that each question physically reaches 10 options. Would that approach work for the normalization strategy?

eldarkurtic · 2024-09-25T14:25:28Z

Unfortunately no, because the idea of normalization is to subtract the random baseline accuracy first, and then to rescale it back to 0-100 (more details here: https://huggingface.co/spaces/open-llm-leaderboard/blog).

wenhuchen · 2024-09-25T14:41:13Z

I have read the blog. I still think that padding the 17% questions with "N/A" options to 10 options should work. Would you mind pointing out what's the issue here?

eldarkurtic changed the title ~~Paper claims there are 10-choices but the test split has varying number from choices (anywhere from 3 to 10)~~ Paper claims there are 10-choices but the test split has varying number of choices (anywhere from 3 to 10) Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paper claims there are 10-choices but the test split has varying number of choices (anywhere from 3 to 10) #24

Paper claims there are 10-choices but the test split has varying number of choices (anywhere from 3 to 10) #24

eldarkurtic commented Sep 25, 2024

wenhuchen commented Sep 25, 2024

eldarkurtic commented Sep 25, 2024

wenhuchen commented Sep 25, 2024

eldarkurtic commented Sep 25, 2024

wenhuchen commented Sep 25, 2024

Paper claims there are 10-choices but the test split has varying number of choices (anywhere from 3 to 10) #24

Paper claims there are 10-choices but the test split has varying number of choices (anywhere from 3 to 10) #24

Comments

eldarkurtic commented Sep 25, 2024

wenhuchen commented Sep 25, 2024

eldarkurtic commented Sep 25, 2024

wenhuchen commented Sep 25, 2024

eldarkurtic commented Sep 25, 2024

wenhuchen commented Sep 25, 2024