Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove manual override keys for uttid in ChatReader #31

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 2 additions & 11 deletions corpus2alpino/readers/chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,16 @@
"""
Module for reading CHAT cha files to parsable utterances.
"""
from typing import cast, Dict, Iterable, List, Tuple
from typing import cast, Dict, Iterable, List
from chamd import ChatReader as ChatParser, ChatLine, ChatTier

import os
import re

from corpus2alpino.abstracts import Reader
from corpus2alpino.models import CollectedFile, Document, MetadataValue, Utterance

MANUAL_IDS = ['xsid', 'xuid']
UTTERANCE_NUMBER_ID = 'uttno'


class ChatReader(Reader):
"""
Class for converting a CHAT file to document.
Expand All @@ -31,13 +29,6 @@ def parse_utterances(self, chat_lines: List[ChatLine]):
number = 0
for line in chat_lines:
number += 1 # start numbering utterances from 1
for id_override_key in MANUAL_IDS:
try:
line.uttid = line.tiers[id_override_key].text
line.metadata['uttid'].text = line.uttid
break
except KeyError:
pass

yield Utterance(line.text,
str(line.uttid),
Expand Down
4 changes: 2 additions & 2 deletions tests/example_chat_expected.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@
##META text origutt = toen <ging zij &-eh haar &+ma ging zij> [//] wou zij de auto maken . 21792_23475
##META text parsefile = PRELAN_example_chat_u00000000002.xml
##META int uttendlineno = 10
##META int uttid = 42
##META int uttid = 2
##META int uttno = 2
##META int uttstartlineno = 9
##META text xsid = 42
42|toen wou zij de auto maken .
2|toen wou zij de auto maken .

##META text origutt = maar toen reed de auto er vandoor .
##META text parsefile = PRELAN_example_chat_u00000000003.xml
Expand Down
Loading