Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
MMLU-SR
mmlusr aims to measure the true comprehension abilities of Large Language Models (LLMs) by challenging their performance in question-answering tasks with modified terms.
Authors
Implementation
We have task.py under mmlusr folder, which is a custom method to load answer choices from HuggingFace.
Usage
We need to figure out a way to run all tasks on Genbench. In our Git repo, it's easily to run all tasks and we specifically made every single task a config file line so that it's simple to pick any task user wants. But the loading strategy I see here, for now we have to manually change the task name in config.jsonnet. We cannot change on the huggingface side as it's already used in lm-eval-harness repo.
Checklist:
genbench-cli test-task
tool.