Skip to content

Commit

Permalink
Merge pull request #999 from EleutherAI/port_master_squadv2
Browse files Browse the repository at this point in the history
  • Loading branch information
lintangsutawika authored Nov 17, 2023
2 parents f7ba0d6 + f40b7d0 commit 84c2cda
Show file tree
Hide file tree
Showing 3 changed files with 55 additions and 15 deletions.
2 changes: 1 addition & 1 deletion lm_eval/tasks/scrolls/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ Once the subset task class has been defined in this file, it can be used by addi
to `lm_eval/tasks/__init__.py`.

NOTE: GovReport may need `max_gen_toks` set larger for causal models.
"""
"""
54 changes: 54 additions & 0 deletions lm_eval/tasks/squadv2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Task-name

### Paper

Title: `Know What You Don’t Know: Unanswerable Questions for SQuAD`
Abstract: https://arxiv.org/abs/1806.03822

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset,
consisting of questions posed by crowdworkers on a set of Wikipedia articles,
where the answer to every question is a segment of text, or span, from the
corresponding reading passage, or the question might be unanswerable.
SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable
questions written adversarially by crowdworkers to look similar to answerable ones.
To do well on SQuAD2.0, systems must not only answer questions when possible, but
also determine when no answer is supported by the paragraph and abstain from answering.

Homepage: https://rajpurkar.github.io/SQuAD-explorer/


### Citation

```
@misc{rajpurkar2018know,
title={Know What You Don't Know: Unanswerable Questions for SQuAD},
author={Pranav Rajpurkar and Robin Jia and Percy Liang},
year={2018},
eprint={1806.03822},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

### Groups and Tasks

#### Groups

* Not part of a group yet

#### Tasks

* `squadv2`: `Default squadv2 task`

### Checklist

For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?


If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
14 changes: 0 additions & 14 deletions lm_eval/tasks/squadv2/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,20 +48,6 @@ def _squad_agg(key, items):
return _squad_metric(predictions=predictions, references=references).get(key, 0)


def normalize_answer(s):
"""Lower text and remove punctuation, articles and extra whitespace."""
def remove_articles(text):
regex = re.compile(r'\b(a|an|the)\b', re.UNICODE)
return re.sub(regex, ' ', text)
def white_space_fix(text):
return ' '.join(text.split())
def remove_punc(text):
exclude = set(string.punctuation)
return ''.join(ch for ch in text if ch not in exclude)
def lower(text):
return text.lower()
return white_space_fix(remove_articles(remove_punc(lower(s))))

@register_task("squadv2")
class SQuAD2(Task):
VERSION = 1
Expand Down

0 comments on commit 84c2cda

Please sign in to comment.