[Task Submission] Hate Speech Detection (`latent_feature_splits`) #37

MaikeZuefle · 2023-11-15T21:50:30Z

Latent Feature-based Data Splits

This project aims to go beyond the random train-test split by developing a more challenging data-splitting process
to better evaluate generalisation performance.
We rely on a models internal representations to create a data split, creating the split by clustering the internal representations and assigning clusters to either the train or the test set.
Hate Speech is used as a testing ground for developing the splitting method.

Authors

Maike Züfle [email protected]
Verna Dankers [email protected]
Ivan Titov [email protected]

Checklist:

I and my co-authors agree that, if this PR is merged, the code will be available under the same license as the genbench_cbt repository.
Prior to submitting, I have ran the GenBench CBT test suite using the genbench-cli test-task tool.
I have read the description of what should be in the doc.md of my task, and have added the required arguments.
I have submitted or will submit an accompanying paper to the GenBench workshop.

…tent feature-based splits

Latent Feature Splits

3acd9e4

kazemnejad changed the title ~~Latent Feature Splits~~ [Task Submission] Hate Speech Detection (latent_feature_split) Nov 16, 2023

Merge branch 'main' into latent_feature_splits

44ec1ca

kazemnejad added the task-submission label Nov 16, 2023

vernadankers added 3 commits November 19, 2023 16:28

Minimal example for data loading, training and evaluating with our la…

86e123d

…tent feature-based splits

Add batch size to usage_example and move tokenize_function into main

ad31c99

Add validation split, replace checkpoint name with bert-base

32b5f34

vernadankers added task-submission and removed task-submission labels Nov 20, 2023

vernadankers closed this Nov 20, 2023

vernadankers deleted the latent_feature_splits branch November 20, 2023 17:26

vernadankers restored the latent_feature_splits branch November 20, 2023 17:29

vernadankers reopened this Nov 20, 2023

vernadankers changed the title ~~[Task Submission] Hate Speech Detection (latent_feature_split)~~ [Task Submission] Hate Speech Detection (latent_feature_splits) Nov 20, 2023

Fixed style errors in usage_example

bb2829e

vernadankers added task-submission and removed task-submission labels Nov 20, 2023

kazemnejad added the ready-to-be-merged label Nov 29, 2023

Merge branch 'main' into latent_feature_splits

0450032

kazemnejad added task-submission and removed task-submission labels Dec 31, 2023

kazemnejad marked this pull request as ready for review December 31, 2023 02:00

kazemnejad merged commit 657d531 into GenBench:main Dec 31, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task Submission] Hate Speech Detection (`latent_feature_splits`) #37

[Task Submission] Hate Speech Detection (`latent_feature_splits`) #37

MaikeZuefle commented Nov 15, 2023

[Task Submission] Hate Speech Detection (latent_feature_splits) #37

[Task Submission] Hate Speech Detection (latent_feature_splits) #37

Conversation

MaikeZuefle commented Nov 15, 2023

Latent Feature-based Data Splits

Authors

Checklist:

[Task Submission] Hate Speech Detection (`latent_feature_splits`) #37

[Task Submission] Hate Speech Detection (`latent_feature_splits`) #37