Skip to content
This repository has been archived by the owner on Jul 23, 2024. It is now read-only.

[Task Submission] ICL consistency test (icl_consistency_test) #11

Merged
merged 21 commits into from
Dec 31, 2023

Conversation

LucWeber
Copy link
Contributor

ICL consistency test

This task tests the consistency of prompt-based model predictions across a wide range of different prompt-setups, calculating accuracy- and consistency-scores.

Authors

Implementation

There is no data-preprocessing necessary.
We implemented a custom evaluate_predictions()-method to calculate accuracy and consistency scores for each setup separately.

Usage

The custom evaluate_predictions()-method accepts inputs in the default format with predictions expecting a Dict[str, Dict[str, Any]] and gold expecting a datasets.Dataset. For predictions, the keys of the outer dictionary should represent the setup_IDs and the keys of the inner dictionary should represent the respective data_IDs. For a fully implemented example evaluation pipeline using huggingface, see example_evaluation.py.

Checklist:

  • I and my co-authors agree that, if this PR is merged, the code will be available under the same license as the genbench_cbt repository.
  • Prior to submitting, I have ran the GenBench CBT test suite using the genbench-cli test-task tool.
  • I have read the description of what should be in the doc.md of my task, and have added the required arguments.
  • I have submitted or will submit an accompanying paper to the GenBench workshop.

@vernadankers
Copy link
Contributor

vernadankers commented Sep 1, 2023

Hello!

We are getting quite close to the deadline (September 1, 11:59PM anywhere on earth), which is why I wanted to remind you of the fact that your PR still needs some attention. Please double-check the automated tests,
and don't forget to submit your accompanying paper to Openreview via https://openreview.net/group?id=GenBench.org/2023/Workshop by September 1.

Good luck finalising your PR and paper, feel free to tag us if you have questions.
Cheers, Verna
On behalf of the GenBench team

@kazemnejad kazemnejad merged commit 382adcf into GenBench:main Dec 31, 2023
3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants