Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task Submission] mmlusr (mmlusr) #3

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SkySuperCat
Copy link

@SkySuperCat SkySuperCat commented Oct 22, 2024

MMLU-SR

mmlusr aims to measure the true comprehension abilities of Large Language Models (LLMs) by challenging their performance in question-answering tasks with modified terms.

Authors

  • Wentian Wang, [email protected]
  • Sarthak Jain
  • Paul Kantor
  • Jacob Feldman
  • Lazaros Gallos
  • Hao Wang

Implementation

We have task.py under mmlusr folder, which is a custom method to load answer choices from HuggingFace.

Usage

We need to figure out a way to run all tasks on Genbench. In our Git repo, it's easily to run all tasks and we specifically made every single task a config file line so that it's simple to pick any task user wants. But the loading strategy I see here, for now we have to manually change the task name in config.jsonnet. We cannot change on the huggingface side as it's already used in lm-eval-harness repo.

Checklist:

  • [ √] I and my co-authors agree that, if this PR is merged, the code will be available under the same license as the genbench_cbt repository.
  • [√ ] Prior to submitting, I have ran the GenBench CBT test suite using the genbench-cli test-task tool.
  • [ √] I have read the description of what should be in the doc.md of my task, and have added the required arguments.
  • [ √] I have submitted or will submit an accompanying paper to the GenBench workshop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant