You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @miguelwon, I am afraid currently there is no marker for sentence boundaries in the files. I am working on enriching the German and subsequently French data with sentence boundaries (empty lines) as part of the overall reworking of the data here: https://github.com/EuropeanaNewspapers/ner-corpora/tree/0.2. It is tedious though and will require more time. If you need sentence boundaries now, I recommend to use a tokenizer.
Is there any marker that separates the sents? Usually, in BIO files each sent is separated by an empty line.
The text was updated successfully, but these errors were encountered: