Due to the many challenges that are already present when it comes to developing accurate Machine Translation (MT) models for low resource languages with limited parallel data available (translated sentence or documents across two languages) even with state of the art Deep Learning methods, being able to determine the correctness or confidence of a given translation has become a prevalent problem as reducing the number of inaccurate translations is often more efficient than making a prediction in the first place. In this project, we train transformer models from Nepali to English using the Facebook Low Resource (FLORES) MT benchmark data-set. We explore various thresholding metrics to classify translations as confident or not confident and try to maximize the corpus BLEU score of all confident translations at varying thresholds. We also applied ensembling and Noisy Channel Decoding (NCD) techniques to further improve our model’s corpus BLEU scores. Our results show that translation quality can be estimated well using metrics like average token log likelihood probabilities and NCD score.
-
Notifications
You must be signed in to change notification settings - Fork 1
DhruvaBansal00/ConfidentMT
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Predicting translation confidence for Transformer-based Neural Machine Translators.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published