-
Notifications
You must be signed in to change notification settings - Fork 18
[Task Submission] Divergent DepRel Distributions (europarl_dbca_splits
)
#33
Conversation
Hi GenBench team! To accommodate multiple datasets, I created this new Task Submission with subtasks, which replaces the old submission https://github.com/GenBench/genbench_cbt/pull/15. I hope this is ok! |
Yes, that's alright; make sure your paper submission contains the right PR URL, though! |
@anmoisio We're in the process of merging the tasks into the repo. In order to merge your task, we need the following changes:
|
Hey @anmoisio! Is there any update on the usage_example? |
Hi, @kazemnejad, sorry for the delay. See last commits for the example. One question about subtasks: I have used the subtask feature in this task, although it doesn't really have subtasks, it has subdatasets in the sense that the abstract task does not change for each subtask, only the dataset is different. There is now a lot of repetition, because I copied, unchanged, the task.py etc for each subtask. So my question is, is there a better way to include sub-datasets for one task? |
@kazemnejad I'd recommend creating an abstract Task (e.g. |
Hi, @kazemnejad, sorry to commit after you added the ready-to-be-merged tag already, but I removed the repetitive code now as you instructed. Thanks for the help again! |
Divergent DepRel Distributions
Note: this PR replaces https://github.com/GenBench/genbench_cbt/pull/15
To assess NMT models' capacity to translate novel syntactical structures, we split the Europarl parallel corpus into training and testing sets with divergent distributions of the syntactical structures. We derive the data splitting method from the distribution-based compositionality assessment (DBCA) method introduced by Keysers et al. (2020). We define the atoms as the lemmas and dependency relations, and the compounds as the three-element tuples of two lemmas (the head and the dependant), and their relation, for instance (appreciate, dobj, vigilance).
Authors
[email protected]
[email protected]
[email protected]
Implementation
This submission modifies the task.py module: evaluate_predictions() is modified to get the chrF2 score from the hf evaluate library, and auxiliary methods are added to calculate divergences between train-test compound and atom distributions.
Usage
To evaluate generalisation, both low- and high-compound-divergence data splits should be evaluated. Therefore, you should run both of the subtasks for the selected language, e.g. subtasks "comdiv0_de" and "comdiv1_de", and take the ratio of the chrF2++ scores:
task.comdiv1_de.evaluate_predictions(predictions, gold) / task.comdiv0_de.evaluate_predictions(predictions, gold)
Checklist:
genbench-cli test-task
tool.