Exploring Indirect Knowledge Transfer in Multilingual Machine Translation Through Targeted Distillation
Course project of LDA-T313 Approaches to Natural Language Understanding
Zihao Li & Chao Wang
This repository contains the implementation of the project "Exploring Indirect Knowledge Transfer in Multilingual Machine Translation Through Targeted Distillation". The project aims to investigate the efficiency of cross-linguistic knowledge transfer in multilingual Neural Machine Translation (NMT) using knowledge distillation techniques.
The study focuses on two main objectives:
- Cross-Linguistic Knowledge Transfer: Evaluate how effectively student models trained on one language perform in translating other related languages within the same language family.
- Correlation of Language Similarity with Transfer Effectiveness: Investigate whether the effectiveness of cross-linguistic knowledge transfer correlates with the degree of linguistic similarity among languages.
We utilize two pre-trained multilingual NMT models from the Helsinki-NLP OPUS-MT project:
opus-mt-tc-big-gmq-en
: Translates from Danish, Norwegian, Swedish to English.opus-mt-tc-big-zle-en
: Translates from Belarusian, Russian, Ukrainian to English.
Our training datasets are derived from the NLLB corpus, filtered for quality using Opusfilter. Each dataset contains 5 million parallel sentences for each language pair involving English.
The distillation process uses the outputs of pre-trained teacher models as the target translations for training smaller student models.
train.sh
Parameter | Teacher | Student |
---|---|---|
Embedding dimension | 1024 | 256 |
Attention heads | 16 | 8 |
Feed forward network dimension | 4096 | 2048 |
Hidden layers | 6 | 3 |
The models are evaluated using BLEU and COMET metrics to measure translation accuracy and fluency.
We used the Tatoeba Translation Challenge and FLORES-200 datasets for evaluating the student models.
- General Performance: Student models show reduced performance when translating languages they were not directly trained on, with a pronounced decline in the East Slavic languages.
- Lexical Similarity Impact: Models trained on languages with closer lexical ties to the target language demonstrated enhanced translation accuracy, particularly evident in the North Germanic languages.