[Task Submission] mmlusr (`mmlusr`) #3

SkySuperCat · 2024-10-22T06:54:15Z

MMLU-SR

mmlusr aims to measure the true comprehension abilities of Large Language Models (LLMs) by challenging their performance in question-answering tasks with modified terms.

Authors

Wentian Wang, [email protected]
Sarthak Jain
Paul Kantor
Jacob Feldman
Lazaros Gallos
Hao Wang

Implementation

We have task.py under mmlusr folder, which is a custom method to load answer choices from HuggingFace.

Usage

We need to figure out a way to run all tasks on Genbench. In our Git repo, it's easily to run all tasks and we specifically made every single task a config file line so that it's simple to pick any task user wants. But the loading strategy I see here, for now we have to manually change the task name in config.jsonnet. We cannot change on the huggingface side as it's already used in lm-eval-harness repo.

Checklist:

[ √] I and my co-authors agree that, if this PR is merged, the code will be available under the same license as the genbench_cbt repository.
[√ ] Prior to submitting, I have ran the GenBench CBT test suite using the genbench-cli test-task tool.
[ √] I have read the description of what should be in the doc.md of my task, and have added the required arguments.
[ √] I have submitted or will submit an accompanying paper to the GenBench workshop.

Add mmlusr

47cbbef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task Submission] mmlusr (`mmlusr`) #3

[Task Submission] mmlusr (`mmlusr`) #3

SkySuperCat commented Oct 22, 2024 •

edited

Loading

[Task Submission] mmlusr (mmlusr) #3

Are you sure you want to change the base?

[Task Submission] mmlusr (mmlusr) #3

Conversation

SkySuperCat commented Oct 22, 2024 • edited Loading

MMLU-SR

Authors

Implementation

Usage

Checklist:

[Task Submission] mmlusr (`mmlusr`) #3

[Task Submission] mmlusr (`mmlusr`) #3

SkySuperCat commented Oct 22, 2024 •

edited

Loading