Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KoCommonGEN v2 benchmark #2208

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

metterian
Copy link

Description:
This PR adds support for the KoCommonGEN v2 benchmark, a new dataset for evaluating Korean commonsense reasoning in large language models.

Changes:

  • Added KoCommonGEN v2 task definition
  • Updated task list to include ko_commongen_v2
  • Added citation information for the benchmark

KoCommonGEN v2 Details:

This benchmark provides a valuable resource for evaluating Korean language models on commonsense reasoning tasks. Adding it to our evaluation suite will help broaden our coverage of multilingual NLP capabilities.

Please review and let me know if any changes or additional information is needed.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment on lines +16 to +18
# "choices": [f"{doc[str(i+1)]}" for i in range(4)],
# "choices": [f'{str(i+1)}. ' + doc['{i}'.format(i=i + 1)] for i in range(4)], # The list of choices.
# "choices": [str(i+1) for i in range(4)], # The list of choices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to check if this is an alternative option (which is why it's commented but left in)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, if this comment is safe to delete let's do so!

Copy link
Collaborator

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Just a few small changes and then we can merge this.

@@ -0,0 +1,19 @@
task: ko_commongen_v2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
task: ko_commongen_v2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove the task: field since this is a template/stub config

Comment on lines +16 to +18
# "choices": [f"{doc[str(i+1)]}" for i in range(4)],
# "choices": [f'{str(i+1)}. ' + doc['{i}'.format(i=i + 1)] for i in range(4)], # The list of choices.
# "choices": [str(i+1) for i in range(4)], # The list of choices.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, if this comment is safe to delete let's do so!

out_doc = {
"query": query,
"choices": [f"{i+1}. {doc[str(i+1)]}" for i in range(4)],
# "choices": [f"{doc[str(i+1)]}" for i in range(4)],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with here

@haileyschoelkopf
Copy link
Collaborator

Hi @metterian , just following up to see if you'd be able to make these final few changes so we can merge this task! If not we'll try to get to them ourselves.

Note also that we'd ideally have an entry in https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/README.md describing the task as well, so users know about your task!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants