Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KoCommonGEN v2 benchmark #2208

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions lm_eval/tasks/ko_commongen_v2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# KoCommonGEN v2

### Paper

Title: `KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models`

Abstract: The paper presents KoCommonGEN v2, a benchmark for evaluating Korean commonsense reasoning in large language models. It was accepted to ACL 2024-Findings.

Homepage: https://huggingface.co/datasets/nlpai-lab/ko_commongen_v2

### Groups and Tasks

#### Groups


#### Tasks

- `ko_commongen_v2`
- `ko_commongen_v2_china (code-switching)`
- `ko_commongen_v2_japan (code-switching)`
- `ko_commongen_v2_korea (code-switching)`
- `ko_commongen_v2_english (code-switching)`
- `ko_commongen_v2_espanol (code-switching)`

### Citation

```
@inproceedings{seo2024Kocommongenv2,
title = "KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models",
author = "Jaehyung Seo and Jaewook Lee and Chanjun Park and SeongTae Hong and Seungjun Lee and Heuiseok Lim",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
month = August,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "TBD",
doi = "TBD",
pages = "TBD"
}
```
19 changes: 19 additions & 0 deletions lm_eval/tasks/ko_commongen_v2/code_switching/_default
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
task: ko_commongen_v2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
task: ko_commongen_v2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove the task: field since this is a template/stub config

dataset_path: nlpai-lab/ko_commongen_v2_code_switching
output_type: multiple_choice
training_split: null
validation_split: null
process_docs: !function utils.process_docs
doc_to_target: "{{gold}}"
doc_to_choice: "choices"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
num_fewshot: 0
metadata:
version: 0.0

5 changes: 5 additions & 0 deletions lm_eval/tasks/ko_commongen_v2/code_switching/china.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
include: _default
task: ko_commongen_v2_china
test_split: china
description: "以下任务是结合给定概念信息 concept set: 中的形态素,创造出符合常识的句子。 以下任务是结合给定概念信息 concept set: 中的形态素,创造出符合常识的句子。"
doc_to_text: "{{query}}\n请回答:"
6 changes: 6 additions & 0 deletions lm_eval/tasks/ko_commongen_v2/code_switching/english.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
include: _default
task: ko_commongen_v2_english
test_split: english
description: "The following task involves combining morphemes from the concept set: to create a sentence that is consistent with commonsense. Choose the number of the option that contains the most logically valid sentence among the four examples created by combining morphemes from the concept set: "
doc_to_text: "{{query}}\nAnswer:"

7 changes: 7 additions & 0 deletions lm_eval/tasks/ko_commongen_v2/code_switching/espanol.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _default
task: ko_commongen_v2_espanol
test_split: espanol
description: "La siguiente tarea consiste en combinar morfemas existentes en el conjunto de conceptos dado, concept set:, para crear una oración que concuerde con el sentido común. Elige el número de la opción que incluya la oración más coherente y válida entre los cuatro ejemplos creados combinando morfemas del concept set: ."
doc_to_text: "{{query}}\nContesta:"


7 changes: 7 additions & 0 deletions lm_eval/tasks/ko_commongen_v2/code_switching/japan.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _default
task: ko_commongen_v2_japan
test_split: japan
description: "次は、与えられた概念情報 concept set: に存在する形態素を組み合わせて、常識に合う文を作る作業です。 concept set: の形態素を組み合わせて作った4つの例の中から、最も常識的で妥当な文を含む選択肢の番号を選んでください。"
doc_to_text: "{{query}}\n正答:"


5 changes: 5 additions & 0 deletions lm_eval/tasks/ko_commongen_v2/code_switching/korean.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
include: _default
task: ko_commongen_v2_korean
test_split: korean
description: "다음은 주어진 개념정보인 concept set: 에 존재하는 형태소를 조합해서 상식에 부합하는 문장을 만드는 작업이다. concept set: 의 형태소를 조합하여 만든 4개의 예시 중에서 가장 상식적으로 타당한 문장을 포함한 선택지의 번호를 선택하라."
doc_to_text: "{{query}}\n정답:"
23 changes: 23 additions & 0 deletions lm_eval/tasks/ko_commongen_v2/code_switching/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import re

import datasets



def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:
def _process_doc(doc):
query = (
f"""concept set: {{{doc['concept_set'].replace("#", ", ")}}}\n""")
query += "\n".join([f"{i+1}. {doc[str(i+1)]}" for i in range(4)])

out_doc = {
"query": query,
"choices": [f"{i+1}. {doc[str(i+1)]}" for i in range(4)],
# "choices": [f"{doc[str(i+1)]}" for i in range(4)],
# "choices": [f'{str(i+1)}. ' + doc['{i}'.format(i=i + 1)] for i in range(4)], # The list of choices.
# "choices": [str(i+1) for i in range(4)], # The list of choices.
Comment on lines +16 to +18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to check if this is an alternative option (which is why it's commented but left in)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, if this comment is safe to delete let's do so!

"gold": doc['gold']-1, # The integer used to index into the correct element of `"choices"`.
}
return out_doc

return dataset.map(_process_doc)
22 changes: 22 additions & 0 deletions lm_eval/tasks/ko_commongen_v2/ko_commongen_v2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
task: ko_commongen_v2
dataset_path: nlpai-lab/ko_commongen_v2
output_type: multiple_choice
training_split: train
test_split: test
description: "다음은 주어진 개념정보인 concept set: 에 존재하는 형태소를 조합해서 상식에 부합하는 문장을 만드는 작업이다. concept set: 의 형태소를 조합하여 만든 4개의 예시 중에서 가장 상식적으로 타당한 문장을 포함한 선택지의 번호를 선택하라."
process_docs: !function utils.process_docs
doc_to_text: "{{query}}\n정답:"
doc_to_target: "{{gold}}"
doc_to_choice: "choices"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
num_fewshot: 2
metadata:
version: 0.0

# Please set seed as 42
23 changes: 23 additions & 0 deletions lm_eval/tasks/ko_commongen_v2/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import re

import datasets



def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:
def _process_doc(doc):
query = (
f"""concept set: {{{doc['concept_set'].replace("#", ", ")}}}\n""")
query += "\n".join([f"{i+1}. {doc[str(i+1)]}" for i in range(4)])

out_doc = {
"query": query,
"choices": [f"{i+1}. {doc[str(i+1)]}" for i in range(4)],
# "choices": [f"{doc[str(i+1)]}" for i in range(4)],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with here

# "choices": [f'{str(i+1)}. ' + doc['{i}'.format(i=i + 1)] for i in range(4)], # The list of choices.
# "choices": [str(i+1) for i in range(4)], # The list of choices.
"gold": doc['gold']-1, # The integer used to index into the correct element of `"choices"`.
}
return out_doc

return dataset.map(_process_doc)
Loading