Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make possible to edit corpora and parser results for valency statistics #1128

Open
vmonakhov opened this issue Jul 6, 2024 · 2 comments
Open
Assignees
Labels
backend bug is related to backend enhancement this label means that resolving the issue would improve some part of the system

Comments

@vmonakhov
Copy link

vmonakhov commented Jul 6, 2024

/adverb tool is used to have valency statistics for adverbs. Related parser results can be previously added/edited/deleted . Parser results can be edited on changing source text and/or changing words annotations. So our report should be carefully refreshed on this.

We should scan for updates and waste/duplicate parser results, sentences, instances. Waste parser results appear in database after removing from corpus or on some other reason.

As said before, parser results can change on related text changes. We should compare sentences for report with ones from database, add/delete from db if required. Sentences can change order, can become shorter or longer, words within them can change order as well, whitespaces and punctuation marks can change too.

Main request is to reuse existent sentences/instances and just update them on some changes.

Corpus for testing:
Uralic › Finno-Permic › Permian › Udmurt › Corpus of Udmurt texts часть 1 › Texts

@vmonakhov vmonakhov added enhancement this label means that resolving the issue would improve some part of the system backend bug is related to backend labels Jul 6, 2024
@vmonakhov vmonakhov self-assigned this Jul 6, 2024
@vmonakhov
Copy link
Author

Resolved. Main points:

  • Sentences can become shorter/longer, change self order and/or words order. Sentence "keeps self" if more than 75% words stay in it all the same.
  • Sentences and items are updated in-place. New sources/sentences/instances are created, waste ones are removed.
  • We're looking for duplicate and waste sources/sentences/instances and get rid of them.

vmonakhov added a commit to ispras/lingvodoc that referenced this issue Jul 9, 2024
* all_changes

* cleanup

* fixes for delete
@vmonakhov
Copy link
Author

vmonakhov commented Jul 11, 2024

  1. Seems like we have to delete linked annotations if we reuse not related instances
  2. Not possible to update verb valency data because we have to control valency_merge_data_pkey == perspective_client_id, perspective_object_id, verb_lex originality

@vmonakhov vmonakhov changed the title Make possible to edit corpora and parser results for adverb statistics Make possible to edit corpora and parser results for valency statistics Jul 12, 2024
vmonakhov added a commit to ispras/lingvodoc that referenced this issue Jul 12, 2024
…act#1128 (#1510)


* merging verb and adverb functionality

* more merging

* fixes and refactoring

* fix

* minor

* next

* next

* huge merge

* cleanup

* common names

* disable parse_eaf for adverbs
vmonakhov added a commit that referenced this issue Jul 12, 2024
* fixes and refactoring

* common names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend bug is related to backend enhancement this label means that resolving the issue would improve some part of the system
Projects
None yet
Development

No branches or pull requests

1 participant