Skip to content
This repository has been archived by the owner on Jul 23, 2024. It is now read-only.

[Task Submission] Hate Speech Detection (latent_feature_splits) #37

Merged
merged 7 commits into from
Dec 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions src/genbench/tasks/latent_feature_splits/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from genbench import TaskDict


class LatentFeatureSplits(TaskDict):
pass
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
{
name: 'Latent Feature Splits (bert_closest_split)',

// @TODO: Add a description of the task
description: "We split hate speech data based on the internal representations of a RoBERTa model.
The o.o.d. data splits leads to an under-representation of parts of the latent space in the
model's training set, making the split more challenging than a random split.",

// @TODO: Add a list of keywords that describe the task
keywords: [
'non-i.i.d. generalisation',
'o.o.d. generalisation',
'latent-features',
'hate speech'
],

authors: [
'Maike Züfle',
'Verna Dankers',
'Ivan Titov',

],

data_source: {
type: 'manual',
test: 'https://raw.githubusercontent.com/MaikeZuefle/Latent-Feature-Splits/main/genbench_splits/hatexplain_bert_closest_split_test_new.jsonl',
train: 'https://raw.githubusercontent.com/MaikeZuefle/Latent-Feature-Splits/main/genbench_splits/hatexplain_bert_closest_split_train.jsonl'
},

has_train_set: true,

task_type: 'multiple_choice',

evaluation_metrics: [
{
hf_id: 'accuracy',
best_score: 1.0,
git_commit_sha: '34d6add55811828baef83e0d7c6826e2193f7b6a',
},
{
hf_id: 'f1',
average: 'macro',
best_score: 1.0,
git_commit_sha: '3a4c40f7397dcd7d9dccf0659616dc6b14072dcb',
},
],

preparation_strategies: {
// A recipe for preparing the model to perform the task by configuring its prompt.
// This recipe is suitable for generative LMs such as GPT-3, OPT, T5, etc.
// We provide a few options for configuring the prompt. But, the task creator can
// also provide a custom prompt preparation in the task's Python class.
finetuning: {
objective: 'maximum_likelihood',
}
},
}
52 changes: 52 additions & 0 deletions src/genbench/tasks/latent_feature_splits/bert_closest_split/doc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Hate Speech Detection (bert_closest_split)

## Abstract
With the ever-growing presence of social media platforms comes the increased spread of harmful content and the need for robust hate speech detection systems.
Such systems easily overfit to specific targets and keywords, and evaluating them without considering distribution shifts that might occur between train and test data overestimates their benefit.
We challenge hate speech models via new train-test splits of existing datasets that rely on the clustering of models' hidden representations.
We present two split variants (Subset-Sum-Split and Closest-Split) that, when applied to two datasets using four models, reveal how models catastrophically fail on blind spots in the latent space.
This result generalises when developing a split with one model and evaluating it on another.
Our analysis suggests that there is no clear surface-level property of the data split that correlates with the decreased performance, which underscores that task difficulty is not always humanly interpretable.
We recommend incorporating latent feature-based splits in model development and release two splits via the GenBench benchmark.

## Examples
{"input": "wow do not all speak at once niggers", "target": 0, "target_options": ["hate", "noHate", "offensive"]}
{"input": "how long will jews be a majority in israel", "target": 1, "target_options": ["hate", "noHate", "offensive"]}
{"input": "sounds like something a moslem would do", "target": 2, "target_options": ["hate", "noHate", "offensive"]}

## Usage
For the task, the model has to decide whether a social media post includes hate speech, offensive speech or normal sentences.

## Data Source
The dataset was published in `HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection ` by Binny Mathew, Punyajoy Saha,
Seid Muhie Yimam, Chris Biemann, Pawan Goyal and Animesh Mukherjee in 2021. It was accepted at AAAI 2021.

It is licensed under the MIT License:

Copyright (c) 2020 Punyajoy Saha

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

## Limitations and Bias
*Note any known limitations or biases that the Hate Speech Detection has, with links and references if possible.*

## GenBench Eval card
This method can be used to test generalisation in HateSpeech for LLMs (pretrain - test locus).
The split is based on the feature representations of a language model, therefore we assume that the shift is a covariate shift. The method assesses the robustness of language models and how well they generalise in out-of-distribution settings.
![GenBench Eval Card](eval_card.png)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
from collections import OrderedDict
from typing import Any, List, Mapping

import datasets
import evaluate

from genbench import Task
from genbench.api import TaskType
from genbench.utils.logging import get_logger


logger = get_logger(__name__)


class LatentFeatureSplitBertClosestSplit(Task):
def evaluate_predictions(
self,
*,
predictions: List[Mapping[str, Any]] = None,
gold: datasets.Dataset = None,
) -> OrderedDict[str, float]:
"""Evaluate the predictions of the model against the gold data.

Args:
predictions: A list of dictionaries, where each dictionary contains the predicted
values for an example. The keys are strings and the values can be any type.
gold: A HuggingFace `datasets.Dataset` object containing the ground truth data for the task.

Returns:
A dictionary containing key-value pairs for the evaluation metric(s) computed on the predicted
values. The keys are strings representing the name of the evaluation metric and the values are
floating-point numbers.

Raises:
ValueError: If a metric returns None.
"""
result = OrderedDict()
for metric_config in self.config.evaluation_metrics:
hf_id = metric_config.hf_id
if isinstance(hf_id, str):
hf_id = [hf_id]

metric = evaluate.load(*hf_id, revision=metric_config.git_commit_sha)

refs_lst = [g["target"] for g in gold]
preds_lst = [pred["target"] for pred in predictions]

ref_type = type(refs_lst[0])
pred_type = type(preds_lst[0])
if pred_type != ref_type:
if self.config.task_type != TaskType.MULTIPLE_CHOICE:
raise ValueError(
f"Predictions and references have different types: preds: {pred_type} and refs: {ref_type}. "
)
# Convert predictions to the same type as the references
if pred_type == str and ref_type == int:
logger.warning("Predictions are strings, but references are ints. Converting predictions to ints.")
converted_preds = []
for pred, ref in zip(preds_lst, gold):
assert "target_options" in ref
converted_preds.append(ref["target_options"].index(pred))
preds_lst = converted_preds
elif pred_type == int and ref_type == str:
logger.warning("Predictions are ints, but references are strings. Converting references to ints.")
converted_refs = []
for pred, ref in zip(preds_lst, gold):
assert "target_options" in ref
converted_refs.append(ref["target_options"].index(ref["target"]))
refs_lst = converted_refs
else:
if self.config.task_type == TaskType.MULTIPLE_CHOICE and pred_type != int:
# Convert both predictions and references to int
logger.warning(
"Predictions and references have the same type, but it is not int. Converting both to int."
)
converted_preds = []
converted_refs = []
for pred, ref in zip(preds_lst, gold):
assert "target_options" in ref
converted_preds.append(ref["target_options"].index(pred))
converted_refs.append(ref["target_options"].index(ref["target"]))
preds_lst = converted_preds
refs_lst = converted_refs

extra_kwargs = metric_config.compute_extra_kwargs or {}
output: dict = metric.compute(predictions=preds_lst, references=refs_lst, **extra_kwargs)

if output is None:
raise ValueError(
f"Metric {metric_config.hf_id} returned None. " f"Please check the metric implementation."
)

# Update output keys to include the metric id
metric_id = "_".join(hf_id)
output = {f"hf_{metric_id}__{k}": v for k, v in output.items()}

result.update(output)

return result
58 changes: 58 additions & 0 deletions src/genbench/tasks/latent_feature_splits/config.jsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
{
name: 'Latent Feature Split',

// @TODO: Add a description of the task
description: "We split hate speech data based on the internal representations of a RoBERTa model.
The o.o.d. data splits leads to an under-representation of parts of the latent space in the
model's training set, making the split more challenging than a random split.",

// @TODO: Add a list of keywords that describe the task
keywords: [
'non-i.i.d. generalisation',
'o.o.d. generalisation',
'latent-features',
'hate speech'
],

authors: [
'Maike Züfle',
'Verna Dankers',
'Ivan Titov',

],

data_source: {
type: 'manual',
test: 'https://raw.githubusercontent.com/MaikeZuefle/Latent-Feature-Splits/main/genbench_splits/hatexplain_roberta_closest_split_test.jsonl',
train: 'https://raw.githubusercontent.com/MaikeZuefle/Latent-Feature-Splits/main/genbench_splits/hatexplain_roberta_closest_split_train.jsonl'
},

has_train_set: true,

task_type: 'multiple_choice',

evaluation_metrics: [
{
hf_id: 'accuracy',
best_score: 1.0,
git_commit_sha: '34d6add55811828baef83e0d7c6826e2193f7b6a',
},
{
hf_id: 'f1',
average: 'macro',
best_score: 1.0,
git_commit_sha: '3a4c40f7397dcd7d9dccf0659616dc6b14072dcb',
},

],

preparation_strategies: {
// A recipe for preparing the model to perform the task by configuring its prompt.
// This recipe is suitable for generative LMs such as GPT-3, OPT, T5, etc.
// We provide a few options for configuring the prompt. But, the task creator can
// also provide a custom prompt preparation in the task's Python class.
finetuning: {
objective: 'maximum_likelihood',
}
},
}
52 changes: 52 additions & 0 deletions src/genbench/tasks/latent_feature_splits/doc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Hate Speech Detection

## Abstract
With the ever-growing presence of social media platforms comes the increased spread of harmful content and the need for robust hate speech detection systems.
Such systems easily overfit to specific targets and keywords, and evaluating them without considering distribution shifts that might occur between train and test data overestimates their benefit.
We challenge hate speech models via new train-test splits of existing datasets that rely on the clustering of models' hidden representations.
We present two split variants (Subset-Sum-Split and Closest-Split) that, when applied to two datasets using four models, reveal how models catastrophically fail on blind spots in the latent space.
This result generalises when developing a split with one model and evaluating it on another.
Our analysis suggests that there is no clear surface-level property of the data split that correlates with the decreased performance, which underscores that task difficulty is not always humanly interpretable.
We recommend incorporating latent feature-based splits in model development and release two splits via the GenBench benchmark.

## Examples
{"input": "wow do not all speak at once niggers", "target": 0, "target_options": ["hate", "noHate", "offensive"]}
{"input": "how long will jews be a majority in israel", "target": 1, "target_options": ["hate", "noHate", "offensive"]}
{"input": "sounds like something a moslem would do", "target": 2, "target_options": ["hate", "noHate", "offensive"]}

## Usage
For the task, the model has to decide whether a social media post includes hate speech, offensive speech or normal sentences.

## Data Source
The dataset was published in `HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection ` by Binny Mathew, Punyajoy Saha,
Seid Muhie Yimam, Chris Biemann, Pawan Goyal and Animesh Mukherjee in 2021. It was accepted at AAAI 2021.

It is licensed under the MIT License:

Copyright (c) 2020 Punyajoy Saha

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

## Limitations and Bias
*Note any known limitations or biases that the Hate Speech Detection has, with links and references if possible.*

## GenBench Eval card
This method can be used to test generalisation in HateSpeech for LLMs (pretrain - test locus).
The split is based on the feature representations of a language model, therefore we assume that the shift is a covariate shift. The method assesses the robustness of language models and how well they generalise in out-of-distribution settings.
![GenBench Eval Card](eval_card.png)
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
{
name: 'Latent Feature Splits (roberta_closest_split)',

// @TODO: Add a description of the task
description: "We split hate speech data based on the internal representations of a RoBERTa model.
The o.o.d. data splits leads to an under-representation of parts of the latent space in the
model's training set, making the split more challenging than a random split.",

// @TODO: Add a list of keywords that describe the task
keywords: [
'non-i.i.d. generalisation',
'o.o.d. generalisation',
'latent-features',
'hate speech'
],

authors: [
'Maike Züfle',
'Verna Dankers',
'Ivan Titov',

],

data_source: {
type: 'manual',
test: 'https://raw.githubusercontent.com/MaikeZuefle/Latent-Feature-Splits/main/genbench_splits/hatexplain_roberta_closest_split_test.jsonl',
train: 'https://raw.githubusercontent.com/MaikeZuefle/Latent-Feature-Splits/main/genbench_splits/hatexplain_roberta_closest_split_train.jsonl'
},

has_train_set: true,

task_type: 'multiple_choice',

evaluation_metrics: [
{
hf_id: 'accuracy',
best_score: 1.0,
git_commit_sha: '34d6add55811828baef83e0d7c6826e2193f7b6a',
},
{
hf_id: 'f1',
average: 'macro',
best_score: 1.0,
git_commit_sha: '3a4c40f7397dcd7d9dccf0659616dc6b14072dcb',
},
],

preparation_strategies: {
// A recipe for preparing the model to perform the task by configuring its prompt.
// This recipe is suitable for generative LMs such as GPT-3, OPT, T5, etc.
// We provide a few options for configuring the prompt. But, the task creator can
// also provide a custom prompt preparation in the task's Python class.
finetuning: {
objective: 'maximum_likelihood',
}
},
}
Loading