This repository contains the data and supplementary materials for our ParlaCLARIN-2024 paper.
The repository contains the GePaDe-ORL corpus, with manual annotations of subjective expressions and their opinion holders and targets.
The data is available in json format.
The json dictionary includes the annotations for 13,222 sentences/clauses with 3,322 subjective expressions. For each sentence, we add the list of tokens (word forms) and lemmas (automatically predicted using [spacy]()) and an annotation dictionary that encodes whether this sentence includes a subjective expression and, if true, the token position of the subjective expression, its view (either Agent, Patient or Speaker view) and a list with role annotations for each sentence token.
Example:
"20003_Zusatzpunkt_2_FDP_Brandenburg_ID20306600_18.11.2021-5": {
"words": [
"Sie",
"litten",
"oftmals",
"unter",
"sozialer",
"Isolation",
"und",
"unter",
"Bewegungsmangel",
"."
],
"lemmas": [
"sie",
"leiden",
"oftmals",
"unter",
"sozial",
"Isolation",
"und",
"unter",
"Bewegungsmangel",
"--"
],
"annotations": {
"1": {
"predicate": "SE-A",
"roles": [
"B-Holder",
"B-V",
"_",
"B-Target",
"I-Target",
"I-Target",
"I-Target",
"I-Target",
"I-Target",
"_"
]
}
}
}
The example above encodes a sentence where "leiden" (suffer) triggers a subjective expression with Agent view (Agent view: the agent of the sentence is the opion holder while the Patient encodes the target of the opinion). The key of the "annotations" dictionary points to the token at position "1" (the verb "leiden") and the role list states for each sentence token whether it fills a role for the respective subjective expression or not. We use the BIO scheme to mark the beginning of a multiword role. "B-V" marks the position of the subjective expression.
The table below shows the distribution of views and labels in the corpus. The annotation guidelines (in German) can be found here. Examples for the additional labels (Effect, Other) are included in the paper.
Agent | Patient | Speaker | Total | |
SE | 2,325 | 138 | 859 | 3,322 |
Roles (all) | 4,594 | 278 | 1,503 | 6,375 |
Target | 2,422 | 109 | 752 | 3,283 |
Holder | 1,998 | 116 | 12 | 2,126 |
Other | 1 | 0 | 643 | 644 |
PTC | 142 | 4 | 53 | 199 |
SVC | 31 | 5 | 38 | 74 |
Effect | 0 | 44 | 5 | 49 |
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
If you're using this data, please cite the following paper:
@InProceedings{rehbein-ponzetto-2024-gepade_orl,
author = {Ines Rehbein and Ponzetto, Simone Paolo},
title = {A New Resource and Baselines for Opinion Role Labelling in German Parliamentary Debates},
booktitle = {Proceedings of the ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora},
month = {May},
year = {2024},
address = {Torino, Italia},
publisher = {Association for Computational Linguistics},
url = {http://www.aclweb.org/anthology/}
}