This repository has been archived by the owner on Jul 23, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #37 from MaikeZuefle/latent_feature_splits
[Task Submission] Hate Speech Detection (`latent_feature_splits`)
- Loading branch information
Showing
15 changed files
with
630 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
from genbench import TaskDict | ||
|
||
|
||
class LatentFeatureSplits(TaskDict): | ||
pass |
Empty file.
57 changes: 57 additions & 0 deletions
57
src/genbench/tasks/latent_feature_splits/bert_closest_split/config.jsonnet
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
{ | ||
name: 'Latent Feature Splits (bert_closest_split)', | ||
|
||
// @TODO: Add a description of the task | ||
description: "We split hate speech data based on the internal representations of a RoBERTa model. | ||
The o.o.d. data splits leads to an under-representation of parts of the latent space in the | ||
model's training set, making the split more challenging than a random split.", | ||
|
||
// @TODO: Add a list of keywords that describe the task | ||
keywords: [ | ||
'non-i.i.d. generalisation', | ||
'o.o.d. generalisation', | ||
'latent-features', | ||
'hate speech' | ||
], | ||
|
||
authors: [ | ||
'Maike Züfle', | ||
'Verna Dankers', | ||
'Ivan Titov', | ||
|
||
], | ||
|
||
data_source: { | ||
type: 'manual', | ||
test: 'https://raw.githubusercontent.com/MaikeZuefle/Latent-Feature-Splits/main/genbench_splits/hatexplain_bert_closest_split_test_new.jsonl', | ||
train: 'https://raw.githubusercontent.com/MaikeZuefle/Latent-Feature-Splits/main/genbench_splits/hatexplain_bert_closest_split_train.jsonl' | ||
}, | ||
|
||
has_train_set: true, | ||
|
||
task_type: 'multiple_choice', | ||
|
||
evaluation_metrics: [ | ||
{ | ||
hf_id: 'accuracy', | ||
best_score: 1.0, | ||
git_commit_sha: '34d6add55811828baef83e0d7c6826e2193f7b6a', | ||
}, | ||
{ | ||
hf_id: 'f1', | ||
average: 'macro', | ||
best_score: 1.0, | ||
git_commit_sha: '3a4c40f7397dcd7d9dccf0659616dc6b14072dcb', | ||
}, | ||
], | ||
|
||
preparation_strategies: { | ||
// A recipe for preparing the model to perform the task by configuring its prompt. | ||
// This recipe is suitable for generative LMs such as GPT-3, OPT, T5, etc. | ||
// We provide a few options for configuring the prompt. But, the task creator can | ||
// also provide a custom prompt preparation in the task's Python class. | ||
finetuning: { | ||
objective: 'maximum_likelihood', | ||
} | ||
}, | ||
} |
52 changes: 52 additions & 0 deletions
52
src/genbench/tasks/latent_feature_splits/bert_closest_split/doc.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Hate Speech Detection (bert_closest_split) | ||
|
||
## Abstract | ||
With the ever-growing presence of social media platforms comes the increased spread of harmful content and the need for robust hate speech detection systems. | ||
Such systems easily overfit to specific targets and keywords, and evaluating them without considering distribution shifts that might occur between train and test data overestimates their benefit. | ||
We challenge hate speech models via new train-test splits of existing datasets that rely on the clustering of models' hidden representations. | ||
We present two split variants (Subset-Sum-Split and Closest-Split) that, when applied to two datasets using four models, reveal how models catastrophically fail on blind spots in the latent space. | ||
This result generalises when developing a split with one model and evaluating it on another. | ||
Our analysis suggests that there is no clear surface-level property of the data split that correlates with the decreased performance, which underscores that task difficulty is not always humanly interpretable. | ||
We recommend incorporating latent feature-based splits in model development and release two splits via the GenBench benchmark. | ||
|
||
## Examples | ||
{"input": "wow do not all speak at once niggers", "target": 0, "target_options": ["hate", "noHate", "offensive"]} | ||
{"input": "how long will jews be a majority in israel", "target": 1, "target_options": ["hate", "noHate", "offensive"]} | ||
{"input": "sounds like something a moslem would do", "target": 2, "target_options": ["hate", "noHate", "offensive"]} | ||
|
||
## Usage | ||
For the task, the model has to decide whether a social media post includes hate speech, offensive speech or normal sentences. | ||
|
||
## Data Source | ||
The dataset was published in `HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection ` by Binny Mathew, Punyajoy Saha, | ||
Seid Muhie Yimam, Chris Biemann, Pawan Goyal and Animesh Mukherjee in 2021. It was accepted at AAAI 2021. | ||
|
||
It is licensed under the MIT License: | ||
|
||
Copyright (c) 2020 Punyajoy Saha | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. | ||
|
||
## Limitations and Bias | ||
*Note any known limitations or biases that the Hate Speech Detection has, with links and references if possible.* | ||
|
||
## GenBench Eval card | ||
This method can be used to test generalisation in HateSpeech for LLMs (pretrain - test locus). | ||
The split is based on the feature representations of a language model, therefore we assume that the shift is a covariate shift. The method assesses the robustness of language models and how well they generalise in out-of-distribution settings. | ||
![GenBench Eval Card](eval_card.png) |
Binary file added
BIN
+172 KB
src/genbench/tasks/latent_feature_splits/bert_closest_split/eval_card.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
99 changes: 99 additions & 0 deletions
99
src/genbench/tasks/latent_feature_splits/bert_closest_split/task.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
from collections import OrderedDict | ||
from typing import Any, List, Mapping | ||
|
||
import datasets | ||
import evaluate | ||
|
||
from genbench import Task | ||
from genbench.api import TaskType | ||
from genbench.utils.logging import get_logger | ||
|
||
|
||
logger = get_logger(__name__) | ||
|
||
|
||
class LatentFeatureSplitBertClosestSplit(Task): | ||
def evaluate_predictions( | ||
self, | ||
*, | ||
predictions: List[Mapping[str, Any]] = None, | ||
gold: datasets.Dataset = None, | ||
) -> OrderedDict[str, float]: | ||
"""Evaluate the predictions of the model against the gold data. | ||
Args: | ||
predictions: A list of dictionaries, where each dictionary contains the predicted | ||
values for an example. The keys are strings and the values can be any type. | ||
gold: A HuggingFace `datasets.Dataset` object containing the ground truth data for the task. | ||
Returns: | ||
A dictionary containing key-value pairs for the evaluation metric(s) computed on the predicted | ||
values. The keys are strings representing the name of the evaluation metric and the values are | ||
floating-point numbers. | ||
Raises: | ||
ValueError: If a metric returns None. | ||
""" | ||
result = OrderedDict() | ||
for metric_config in self.config.evaluation_metrics: | ||
hf_id = metric_config.hf_id | ||
if isinstance(hf_id, str): | ||
hf_id = [hf_id] | ||
|
||
metric = evaluate.load(*hf_id, revision=metric_config.git_commit_sha) | ||
|
||
refs_lst = [g["target"] for g in gold] | ||
preds_lst = [pred["target"] for pred in predictions] | ||
|
||
ref_type = type(refs_lst[0]) | ||
pred_type = type(preds_lst[0]) | ||
if pred_type != ref_type: | ||
if self.config.task_type != TaskType.MULTIPLE_CHOICE: | ||
raise ValueError( | ||
f"Predictions and references have different types: preds: {pred_type} and refs: {ref_type}. " | ||
) | ||
# Convert predictions to the same type as the references | ||
if pred_type == str and ref_type == int: | ||
logger.warning("Predictions are strings, but references are ints. Converting predictions to ints.") | ||
converted_preds = [] | ||
for pred, ref in zip(preds_lst, gold): | ||
assert "target_options" in ref | ||
converted_preds.append(ref["target_options"].index(pred)) | ||
preds_lst = converted_preds | ||
elif pred_type == int and ref_type == str: | ||
logger.warning("Predictions are ints, but references are strings. Converting references to ints.") | ||
converted_refs = [] | ||
for pred, ref in zip(preds_lst, gold): | ||
assert "target_options" in ref | ||
converted_refs.append(ref["target_options"].index(ref["target"])) | ||
refs_lst = converted_refs | ||
else: | ||
if self.config.task_type == TaskType.MULTIPLE_CHOICE and pred_type != int: | ||
# Convert both predictions and references to int | ||
logger.warning( | ||
"Predictions and references have the same type, but it is not int. Converting both to int." | ||
) | ||
converted_preds = [] | ||
converted_refs = [] | ||
for pred, ref in zip(preds_lst, gold): | ||
assert "target_options" in ref | ||
converted_preds.append(ref["target_options"].index(pred)) | ||
converted_refs.append(ref["target_options"].index(ref["target"])) | ||
preds_lst = converted_preds | ||
refs_lst = converted_refs | ||
|
||
extra_kwargs = metric_config.compute_extra_kwargs or {} | ||
output: dict = metric.compute(predictions=preds_lst, references=refs_lst, **extra_kwargs) | ||
|
||
if output is None: | ||
raise ValueError( | ||
f"Metric {metric_config.hf_id} returned None. " f"Please check the metric implementation." | ||
) | ||
|
||
# Update output keys to include the metric id | ||
metric_id = "_".join(hf_id) | ||
output = {f"hf_{metric_id}__{k}": v for k, v in output.items()} | ||
|
||
result.update(output) | ||
|
||
return result |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
{ | ||
name: 'Latent Feature Split', | ||
|
||
// @TODO: Add a description of the task | ||
description: "We split hate speech data based on the internal representations of a RoBERTa model. | ||
The o.o.d. data splits leads to an under-representation of parts of the latent space in the | ||
model's training set, making the split more challenging than a random split.", | ||
|
||
// @TODO: Add a list of keywords that describe the task | ||
keywords: [ | ||
'non-i.i.d. generalisation', | ||
'o.o.d. generalisation', | ||
'latent-features', | ||
'hate speech' | ||
], | ||
|
||
authors: [ | ||
'Maike Züfle', | ||
'Verna Dankers', | ||
'Ivan Titov', | ||
|
||
], | ||
|
||
data_source: { | ||
type: 'manual', | ||
test: 'https://raw.githubusercontent.com/MaikeZuefle/Latent-Feature-Splits/main/genbench_splits/hatexplain_roberta_closest_split_test.jsonl', | ||
train: 'https://raw.githubusercontent.com/MaikeZuefle/Latent-Feature-Splits/main/genbench_splits/hatexplain_roberta_closest_split_train.jsonl' | ||
}, | ||
|
||
has_train_set: true, | ||
|
||
task_type: 'multiple_choice', | ||
|
||
evaluation_metrics: [ | ||
{ | ||
hf_id: 'accuracy', | ||
best_score: 1.0, | ||
git_commit_sha: '34d6add55811828baef83e0d7c6826e2193f7b6a', | ||
}, | ||
{ | ||
hf_id: 'f1', | ||
average: 'macro', | ||
best_score: 1.0, | ||
git_commit_sha: '3a4c40f7397dcd7d9dccf0659616dc6b14072dcb', | ||
}, | ||
|
||
], | ||
|
||
preparation_strategies: { | ||
// A recipe for preparing the model to perform the task by configuring its prompt. | ||
// This recipe is suitable for generative LMs such as GPT-3, OPT, T5, etc. | ||
// We provide a few options for configuring the prompt. But, the task creator can | ||
// also provide a custom prompt preparation in the task's Python class. | ||
finetuning: { | ||
objective: 'maximum_likelihood', | ||
} | ||
}, | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Hate Speech Detection | ||
|
||
## Abstract | ||
With the ever-growing presence of social media platforms comes the increased spread of harmful content and the need for robust hate speech detection systems. | ||
Such systems easily overfit to specific targets and keywords, and evaluating them without considering distribution shifts that might occur between train and test data overestimates their benefit. | ||
We challenge hate speech models via new train-test splits of existing datasets that rely on the clustering of models' hidden representations. | ||
We present two split variants (Subset-Sum-Split and Closest-Split) that, when applied to two datasets using four models, reveal how models catastrophically fail on blind spots in the latent space. | ||
This result generalises when developing a split with one model and evaluating it on another. | ||
Our analysis suggests that there is no clear surface-level property of the data split that correlates with the decreased performance, which underscores that task difficulty is not always humanly interpretable. | ||
We recommend incorporating latent feature-based splits in model development and release two splits via the GenBench benchmark. | ||
|
||
## Examples | ||
{"input": "wow do not all speak at once niggers", "target": 0, "target_options": ["hate", "noHate", "offensive"]} | ||
{"input": "how long will jews be a majority in israel", "target": 1, "target_options": ["hate", "noHate", "offensive"]} | ||
{"input": "sounds like something a moslem would do", "target": 2, "target_options": ["hate", "noHate", "offensive"]} | ||
|
||
## Usage | ||
For the task, the model has to decide whether a social media post includes hate speech, offensive speech or normal sentences. | ||
|
||
## Data Source | ||
The dataset was published in `HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection ` by Binny Mathew, Punyajoy Saha, | ||
Seid Muhie Yimam, Chris Biemann, Pawan Goyal and Animesh Mukherjee in 2021. It was accepted at AAAI 2021. | ||
|
||
It is licensed under the MIT License: | ||
|
||
Copyright (c) 2020 Punyajoy Saha | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. | ||
|
||
## Limitations and Bias | ||
*Note any known limitations or biases that the Hate Speech Detection has, with links and references if possible.* | ||
|
||
## GenBench Eval card | ||
This method can be used to test generalisation in HateSpeech for LLMs (pretrain - test locus). | ||
The split is based on the feature representations of a language model, therefore we assume that the shift is a covariate shift. The method assesses the robustness of language models and how well they generalise in out-of-distribution settings. | ||
![GenBench Eval Card](eval_card.png) |
Empty file.
57 changes: 57 additions & 0 deletions
57
src/genbench/tasks/latent_feature_splits/roberta_closest_split/config.jsonnet
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
{ | ||
name: 'Latent Feature Splits (roberta_closest_split)', | ||
|
||
// @TODO: Add a description of the task | ||
description: "We split hate speech data based on the internal representations of a RoBERTa model. | ||
The o.o.d. data splits leads to an under-representation of parts of the latent space in the | ||
model's training set, making the split more challenging than a random split.", | ||
|
||
// @TODO: Add a list of keywords that describe the task | ||
keywords: [ | ||
'non-i.i.d. generalisation', | ||
'o.o.d. generalisation', | ||
'latent-features', | ||
'hate speech' | ||
], | ||
|
||
authors: [ | ||
'Maike Züfle', | ||
'Verna Dankers', | ||
'Ivan Titov', | ||
|
||
], | ||
|
||
data_source: { | ||
type: 'manual', | ||
test: 'https://raw.githubusercontent.com/MaikeZuefle/Latent-Feature-Splits/main/genbench_splits/hatexplain_roberta_closest_split_test.jsonl', | ||
train: 'https://raw.githubusercontent.com/MaikeZuefle/Latent-Feature-Splits/main/genbench_splits/hatexplain_roberta_closest_split_train.jsonl' | ||
}, | ||
|
||
has_train_set: true, | ||
|
||
task_type: 'multiple_choice', | ||
|
||
evaluation_metrics: [ | ||
{ | ||
hf_id: 'accuracy', | ||
best_score: 1.0, | ||
git_commit_sha: '34d6add55811828baef83e0d7c6826e2193f7b6a', | ||
}, | ||
{ | ||
hf_id: 'f1', | ||
average: 'macro', | ||
best_score: 1.0, | ||
git_commit_sha: '3a4c40f7397dcd7d9dccf0659616dc6b14072dcb', | ||
}, | ||
], | ||
|
||
preparation_strategies: { | ||
// A recipe for preparing the model to perform the task by configuring its prompt. | ||
// This recipe is suitable for generative LMs such as GPT-3, OPT, T5, etc. | ||
// We provide a few options for configuring the prompt. But, the task creator can | ||
// also provide a custom prompt preparation in the task's Python class. | ||
finetuning: { | ||
objective: 'maximum_likelihood', | ||
} | ||
}, | ||
} |
Oops, something went wrong.