[Task Submission] ICL consistency test (`icl_consistency_test`) #11

LucWeber · 2023-07-28T09:06:59Z

ICL consistency test

This task tests the consistency of prompt-based model predictions across a wide range of different prompt-setups, calculating accuracy- and consistency-scores.

Authors

Lucas Weber [email protected]
Elia Bruni [email protected]
Dieuwke Hupkes [email protected]

Implementation

There is no data-preprocessing necessary.
We implemented a custom evaluate_predictions()-method to calculate accuracy and consistency scores for each setup separately.

Usage

The custom evaluate_predictions()-method accepts inputs in the default format with predictions expecting a Dict[str, Dict[str, Any]] and gold expecting a datasets.Dataset. For predictions, the keys of the outer dictionary should represent the setup_IDs and the keys of the inner dictionary should represent the respective data_IDs. For a fully implemented example evaluation pipeline using huggingface, see example_evaluation.py.

Checklist:

I and my co-authors agree that, if this PR is merged, the code will be available under the same license as the genbench_cbt repository.
Prior to submitting, I have ran the GenBench CBT test suite using the genbench-cli test-task tool.
I have read the description of what should be in the doc.md of my task, and have added the required arguments.
I have submitted or will submit an accompanying paper to the GenBench workshop.

…est-script

…nbench_cbt into ICL_consistency_test

vernadankers · 2023-09-01T10:37:13Z

Hello!

We are getting quite close to the deadline (September 1, 11:59PM anywhere on earth), which is why I wanted to remind you of the fact that your PR still needs some attention. Please double-check the automated tests,
and don't forget to submit your accompanying paper to Openreview via https://openreview.net/group?id=GenBench.org/2023/Workshop by September 1.

Good luck finalising your PR and paper, feel free to tag us if you have questions.
Cheers, Verna
On behalf of the GenBench team

LucWeber added 7 commits July 14, 2023 18:28

Add ICL consistency test

fc84f09

Implement kappa, write doc, update config, create eval card, create t…

4071f1f

…est-script

..

c9a97e2

..

7978de2

..

ad27915

..

2720c09

Style and quality checks

31b09af

kazemnejad added the task-submission label Jul 29, 2023

LucWeber and others added 2 commits August 22, 2023 18:23

Add size information, fix typo

dc620af

Merge branch 'main' into ICL_consistency_test

73f8bca

kazemnejad added task-submission and removed task-submission labels Aug 23, 2023

LucWeber added 8 commits August 30, 2023 11:52

Add doc-string

ad6f5dd

Rename factor "Instruction quality"

06a66c2

Add kappa_avg as metric output

5e7354e

Add main effects of factors as metric

b198294

Refactor

c99eba6

Merge branch 'ICL_consistency_test' of https://github.com/LucWeber/ge…

b298358

…nbench_cbt into ICL_consistency_test

Split datasets into subtasks

810b129

Final changes

7d738e6

vernadankers added task-submission and removed task-submission labels Sep 1, 2023

LucWeber added 3 commits September 1, 2023 15:10

Style and quality check

a543041

Style and quality check

65fb871

Update abstract and eval card; Cosmetics

5eb610c

kazemnejad added task-submission and removed task-submission labels Sep 4, 2023

kazemnejad added the ready-to-be-merged label Nov 16, 2023

Merge branch 'main' into ICL_consistency_test

f64491b

kazemnejad added task-submission and removed task-submission labels Dec 31, 2023

kazemnejad merged commit 382adcf into GenBench:main Dec 31, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task Submission] ICL consistency test (`icl_consistency_test`) #11

[Task Submission] ICL consistency test (`icl_consistency_test`) #11

LucWeber commented Jul 28, 2023

vernadankers commented Sep 1, 2023 •

edited

Loading

[Task Submission] ICL consistency test (icl_consistency_test) #11

[Task Submission] ICL consistency test (icl_consistency_test) #11

Conversation

LucWeber commented Jul 28, 2023

ICL consistency test

Authors

Implementation

Usage

Checklist:

vernadankers commented Sep 1, 2023 • edited Loading

[Task Submission] ICL consistency test (`icl_consistency_test`) #11

[Task Submission] ICL consistency test (`icl_consistency_test`) #11

vernadankers commented Sep 1, 2023 •

edited

Loading