Skip to content

Latest commit

 

History

History
62 lines (36 loc) · 3.23 KB

README.md

File metadata and controls

62 lines (36 loc) · 3.23 KB

SpkAtt-2023

This repository contains the data and supplementary materials for Task 1 of the

2023 Shared Task on Speaker Attribution (SpkAtt-2023),

co-located with KONVENS 2023.

Important dates:

  • February, 2023 - Trial data release
  • April 1, 2023 - Training and development data release
  • June 15, 2023 - Test data release (blind)
  • July 1, 2023 - Submissions open
  • July 31, 2023 - Submissions for Task1, subtask1 (full task) close
  • August 3, 2023 - Submissions for Task1, subtask 2 (roles only) close
  • August 14, 2023 - System descriptions due
  • September 15, 2023 - Camera-ready system paper deadline
  • September 18, 2023 - Workshop at KONVENS 2023

Workshop programm

Sep 18, 2023 @ KONVENS 2023

Program schedule

  • 15:00 Uhr: Welcome & Shared Task Overview (ST organisers)
  • 15:30 Uhr: Speaker Attribution in German Parliamentary Debates with QLoRA-adapted Large Language Models (Tobias Bornheim, Niklas Grieger, Patrick Gustav Blaneck and Stephan Bialonski)
  • 16:00 Uhr: Politics, BERTed: Automatic Attribution of Speech Events in German Parliamentary Debates (Anton Ehrmanntraut)
  • 16:30 Uhr: Discussion
  • 17:00 Uhr: Closing

Proceedings

The proceedings can be found here: pdf.

Task 1 data format:

The data is available in json format where each document (speech) is a json file.

The unit of analysis is a paragraph sentence (we changed the format from paragraphs to sentences).

The json dictionary includes a list of Sentences and a list of Annotations. Each item in the Sentences list is a dictionary with SentenceID and a list of Tokens for this sentence. Each item in the Annotations list is a dictionary that includes the ids (sentence:token id) for the cue word(s) that trigger a speech event and the ids for the roles that are realised for this cue.

For a more detailed description of the data format (Task 1) and some examples, see this pdf. For more information on our annotation scheme, please refer to the annotation guidelines. Please note that the guidelines have not yet been finalised and might include some inconsistencies and errors that we try to fix in the next couple of weeks.

We tried to harmonise the data format for Task1 and Task2 as much as possible, which resulted in a file format where the annotations are separated from the text. This makes it a bit harder to inspect the data. We therefore also provide an alternative data format, mostly to make the data more human-readible. This alternative format is described here. However, the official shared task format is the one described in the first document (Dataformat_Task1_a.pdf) and we do not provide evaluation scripts for the second format.