-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add KoCommonGEN v2 benchmark #2208
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# KoCommonGEN v2 | ||
|
||
### Paper | ||
|
||
Title: `KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models` | ||
|
||
Abstract: The paper presents KoCommonGEN v2, a benchmark for evaluating Korean commonsense reasoning in large language models. It was accepted to ACL 2024-Findings. | ||
|
||
Homepage: https://huggingface.co/datasets/nlpai-lab/ko_commongen_v2 | ||
|
||
### Groups and Tasks | ||
|
||
#### Groups | ||
|
||
|
||
#### Tasks | ||
|
||
- `ko_commongen_v2` | ||
- `ko_commongen_v2_china (code-switching)` | ||
- `ko_commongen_v2_japan (code-switching)` | ||
- `ko_commongen_v2_korea (code-switching)` | ||
- `ko_commongen_v2_english (code-switching)` | ||
- `ko_commongen_v2_espanol (code-switching)` | ||
|
||
### Citation | ||
|
||
``` | ||
@inproceedings{seo2024Kocommongenv2, | ||
title = "KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models", | ||
author = "Jaehyung Seo and Jaewook Lee and Chanjun Park and SeongTae Hong and Seungjun Lee and Heuiseok Lim", | ||
booktitle = "Findings of the Association for Computational Linguistics: ACL 2024", | ||
month = August, | ||
year = "2024", | ||
address = "Bangkok, Thailand", | ||
publisher = "Association for Computational Linguistics", | ||
url = "TBD", | ||
doi = "TBD", | ||
pages = "TBD" | ||
} | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
task: ko_commongen_v2 | ||
dataset_path: nlpai-lab/ko_commongen_v2_code_switching | ||
output_type: multiple_choice | ||
training_split: null | ||
validation_split: null | ||
process_docs: !function utils.process_docs | ||
doc_to_target: "{{gold}}" | ||
doc_to_choice: "choices" | ||
metric_list: | ||
- metric: acc | ||
aggregation: mean | ||
higher_is_better: true | ||
- metric: acc_norm | ||
aggregation: mean | ||
higher_is_better: true | ||
num_fewshot: 0 | ||
metadata: | ||
version: 0.0 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
include: _default | ||
task: ko_commongen_v2_china | ||
test_split: china | ||
description: "以下任务是结合给定概念信息 concept set: 中的形态素,创造出符合常识的句子。 以下任务是结合给定概念信息 concept set: 中的形态素,创造出符合常识的句子。" | ||
doc_to_text: "{{query}}\n请回答:" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
include: _default | ||
task: ko_commongen_v2_english | ||
test_split: english | ||
description: "The following task involves combining morphemes from the concept set: to create a sentence that is consistent with commonsense. Choose the number of the option that contains the most logically valid sentence among the four examples created by combining morphemes from the concept set: " | ||
doc_to_text: "{{query}}\nAnswer:" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
include: _default | ||
task: ko_commongen_v2_espanol | ||
test_split: espanol | ||
description: "La siguiente tarea consiste en combinar morfemas existentes en el conjunto de conceptos dado, concept set:, para crear una oración que concuerde con el sentido común. Elige el número de la opción que incluya la oración más coherente y válida entre los cuatro ejemplos creados combinando morfemas del concept set: ." | ||
doc_to_text: "{{query}}\nContesta:" | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
include: _default | ||
task: ko_commongen_v2_japan | ||
test_split: japan | ||
description: "次は、与えられた概念情報 concept set: に存在する形態素を組み合わせて、常識に合う文を作る作業です。 concept set: の形態素を組み合わせて作った4つの例の中から、最も常識的で妥当な文を含む選択肢の番号を選んでください。" | ||
doc_to_text: "{{query}}\n正答:" | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
include: _default | ||
task: ko_commongen_v2_korean | ||
test_split: korean | ||
description: "다음은 주어진 개념정보인 concept set: 에 존재하는 형태소를 조합해서 상식에 부합하는 문장을 만드는 작업이다. concept set: 의 형태소를 조합하여 만든 4개의 예시 중에서 가장 상식적으로 타당한 문장을 포함한 선택지의 번호를 선택하라." | ||
doc_to_text: "{{query}}\n정답:" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
import re | ||
|
||
import datasets | ||
|
||
|
||
|
||
def process_docs(dataset: datasets.Dataset) -> datasets.Dataset: | ||
def _process_doc(doc): | ||
query = ( | ||
f"""concept set: {{{doc['concept_set'].replace("#", ", ")}}}\n""") | ||
query += "\n".join([f"{i+1}. {doc[str(i+1)]}" for i in range(4)]) | ||
|
||
out_doc = { | ||
"query": query, | ||
"choices": [f"{i+1}. {doc[str(i+1)]}" for i in range(4)], | ||
# "choices": [f"{doc[str(i+1)]}" for i in range(4)], | ||
# "choices": [f'{str(i+1)}. ' + doc['{i}'.format(i=i + 1)] for i in range(4)], # The list of choices. | ||
# "choices": [str(i+1) for i in range(4)], # The list of choices. | ||
Comment on lines
+16
to
+18
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just want to check if this is an alternative option (which is why it's commented but left in)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1, if this comment is safe to delete let's do so! |
||
"gold": doc['gold']-1, # The integer used to index into the correct element of `"choices"`. | ||
} | ||
return out_doc | ||
|
||
return dataset.map(_process_doc) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
task: ko_commongen_v2 | ||
dataset_path: nlpai-lab/ko_commongen_v2 | ||
output_type: multiple_choice | ||
training_split: train | ||
test_split: test | ||
description: "다음은 주어진 개념정보인 concept set: 에 존재하는 형태소를 조합해서 상식에 부합하는 문장을 만드는 작업이다. concept set: 의 형태소를 조합하여 만든 4개의 예시 중에서 가장 상식적으로 타당한 문장을 포함한 선택지의 번호를 선택하라." | ||
process_docs: !function utils.process_docs | ||
doc_to_text: "{{query}}\n정답:" | ||
doc_to_target: "{{gold}}" | ||
doc_to_choice: "choices" | ||
metric_list: | ||
- metric: acc | ||
aggregation: mean | ||
higher_is_better: true | ||
- metric: acc_norm | ||
aggregation: mean | ||
higher_is_better: true | ||
num_fewshot: 2 | ||
metadata: | ||
version: 0.0 | ||
|
||
# Please set seed as 42 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
import re | ||
|
||
import datasets | ||
|
||
|
||
|
||
def process_docs(dataset: datasets.Dataset) -> datasets.Dataset: | ||
def _process_doc(doc): | ||
query = ( | ||
f"""concept set: {{{doc['concept_set'].replace("#", ", ")}}}\n""") | ||
query += "\n".join([f"{i+1}. {doc[str(i+1)]}" for i in range(4)]) | ||
|
||
out_doc = { | ||
"query": query, | ||
"choices": [f"{i+1}. {doc[str(i+1)]}" for i in range(4)], | ||
# "choices": [f"{doc[str(i+1)]}" for i in range(4)], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same with here |
||
# "choices": [f'{str(i+1)}. ' + doc['{i}'.format(i=i + 1)] for i in range(4)], # The list of choices. | ||
# "choices": [str(i+1) for i in range(4)], # The list of choices. | ||
"gold": doc['gold']-1, # The integer used to index into the correct element of `"choices"`. | ||
} | ||
return out_doc | ||
|
||
return dataset.map(_process_doc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove the
task:
field since this is a template/stub config