ConfidentMT

Due to the many challenges that are already present when it comes to developing accurate Machine Translation (MT) models for low resource languages with limited parallel data available (translated sentence or documents across two languages) even with state of the art Deep Learning methods, being able to determine the correctness or confidence of a given translation has become a prevalent problem as reducing the number of inaccurate translations is often more efficient than making a prediction in the first place. In this project, we train transformer models from Nepali to English using the Facebook Low Resource (FLORES) MT benchmark data-set. We explore various thresholding metrics to classify translations as confident or not confident and try to maximize the corpus BLEU score of all confident translations at varying thresholds. We also applied ensembling and Noisy Channel Decoding (NCD) techniques to further improve our model’s corpus BLEU scores. Our results show that translation quality can be estimated well using metrics like average token log likelihood probabilities and NCD score.

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
DeprecatedColabScripts		DeprecatedColabScripts
NCDScripts		NCDScripts
src		src
.gitignore		.gitignore
README.md		README.md
driver.ipynb		driver.ipynb
driverTransformers.ipynb		driverTransformers.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConfidentMT

About

Releases

Packages

Contributors 2

Languages

DhruvaBansal00/ConfidentMT

Folders and files

Latest commit

History

Repository files navigation

ConfidentMT

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages