Skip to content

Predicting translation confidence for Transformer-based Neural Machine Translators.

Notifications You must be signed in to change notification settings

DhruvaBansal00/ConfidentMT

Repository files navigation

ConfidentMT

Due to the many challenges that are already present when it comes to developing accurate Machine Translation (MT) models for low resource languages with limited parallel data available (translated sentence or documents across two languages) even with state of the art Deep Learning methods, being able to determine the correctness or confidence of a given translation has become a prevalent problem as reducing the number of inaccurate translations is often more efficient than making a prediction in the first place. In this project, we train transformer models from Nepali to English using the Facebook Low Resource (FLORES) MT benchmark data-set. We explore various thresholding metrics to classify translations as confident or not confident and try to maximize the corpus BLEU score of all confident translations at varying thresholds. We also applied ensembling and Noisy Channel Decoding (NCD) techniques to further improve our model’s corpus BLEU scores. Our results show that translation quality can be estimated well using metrics like average token log likelihood probabilities and NCD score.

About

Predicting translation confidence for Transformer-based Neural Machine Translators.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published