Skip to content
Susana Sotelo Docío edited this page Dec 10, 2020 · 29 revisions

Detected bugs

Splitter

  • [es] Úsalo [fixed in commit b38f047]
    Some verbal forms are not splitted correctly when they start with uppercase.

    Úsalo con precaución.

    expected: 'Usa', 'lo', 'con', 'precaución', '.'
    got: 'Úsalo', 'con', 'precaución', '.'

    With two clitics as well [fixed in commit 2e28d5d]

    Ábreselo antes.

    expected: 'Abre', 'se', 'lo', 'antes', '.'
    got: 'Ábreselo', 'antes', '.'

  • [gl] Este [fixed in commit e09dd4f]
    The determiner este is incorrectly splitted at the beginning of the sentence.

    Este xeito.

    expected: 'Este', 'xeito', '.'
    got: 'Es', 'te', 'xeito', '.'

    Value of variable $excep

  • [es] Abráselo
    Some imperative forms of abrasar are incorrectly lemmatized as abrir when combined with some clitics.

    Abráselo inmediatamente.

    expected: 'Abrase', 'lo'
    got: 'Abra', 'se', 'lo'

  • [es] Enseñarlo [fixed in commit 27c9f8e]
    Verb forms having non-ascii chars (ex. "ñ") and clitics attached are incorrectly splitted.

    Quiero enseñarlo.

    expected: 'enseñar', 'lo'
    got: 'enseñarlo'

  • [gl] Splitted entities [fixed in commit 706a7bd]
    Some entities are splitted even in non ambiguous positions (middle of the sentence).

    O concerto foi na Casa das Crechas

    expected: 'Casa', 'de', 'as', 'Crechas'
    got: 'Casa', 'de', 'as', 'Cre', 'che', 'as'

    Other examples: Follas Vellas, Ponte Caldelas, Alfama, Área Central, Rías Baixas, Torrente Ballester, Apóstolo, Orella

  • [es] Correos [fixed in commit 7a53f3a]
    The entity Correos is ambiguous with the imperative form correos (corred + os)

    Tengo que ir a Correos inmediatamente.

    expected: 'Correos'
    got: 'Corred', 'os'

  • [es] Reírse [fixed in commit 23fab4a]
    Verbs with accented infinitives are incorrectly splitted when combined with one clitic.

    No quiere reírse de él.

    expected: 'reír', 'se'
    got: 'reírse'

Tagger

  • [gl] a ría [fixed in commit ac8d345]

    A ría de Vigo.

    expected: "ría ría NCFS000"
    got: "ría rir VMII3S0"

  • [pt] Mos
    This entity is incorrectly splitted and tagged as PP+PP, even at positions without ambiguity (e.g. starting with an uppercase letter and in the middle of the sentence).

    (pt) Estive em Mos no verão.

    expected: "Mos mos NP00000"
    got: "Mos me+os PP+PP"

NER/NEC

  • [es] Castilla-La Mancha [fixed in commit cb6508c]
    This is a multi-token entity, but is treated as two independent entities.

    Yo soy de Castilla-La Mancha.

    expected: "Castilla-La_Mancha NP00G00"
    got: "Castilla-La castilla-la NP00V00
    Mancha mancha NP00G00"

Clone this wiki locally