Exploring Indirect Knowledge Transfer in Multilingual Machine Translation Through Targeted Distillation

Course project of LDA-T313 Approaches to Natural Language Understanding

Zihao Li & Chao Wang

Overview

This repository contains the implementation of the project "Exploring Indirect Knowledge Transfer in Multilingual Machine Translation Through Targeted Distillation". The project aims to investigate the efficiency of cross-linguistic knowledge transfer in multilingual Neural Machine Translation (NMT) using knowledge distillation techniques.

Objectives

The study focuses on two main objectives:

Cross-Linguistic Knowledge Transfer: Evaluate how effectively student models trained on one language perform in translating other related languages within the same language family.
Correlation of Language Similarity with Transfer Effectiveness: Investigate whether the effectiveness of cross-linguistic knowledge transfer correlates with the degree of linguistic similarity among languages.

Methodology

Teacher Models

We utilize two pre-trained multilingual NMT models from the Helsinki-NLP OPUS-MT project:

opus-mt-tc-big-gmq-en: Translates from Danish, Norwegian, Swedish to English.
opus-mt-tc-big-zle-en: Translates from Belarusian, Russian, Ukrainian to English.

Training Datasets

Our training datasets are derived from the NLLB corpus, filtered for quality using Opusfilter. Each dataset contains 5 million parallel sentences for each language pair involving English.

Distillation Process

The distillation process uses the outputs of pre-trained teacher models as the target translations for training smaller student models.

train.sh

Model Configurations

Parameter	Teacher	Student
Embedding dimension	1024	256
Attention heads	16	8
Feed forward network dimension	4096	2048
Hidden layers	6	3

Evaluation Metrics

The models are evaluated using BLEU and COMET metrics to measure translation accuracy and fluency.

Testing Datasets

We used the Tatoeba Translation Challenge and FLORES-200 datasets for evaluating the student models.

Key Findings

General Performance: Student models show reduced performance when translating languages they were not directly trained on, with a pronounced decline in the East Slavic languages.
Lexical Similarity Impact: Models trained on languages with closer lexical ties to the target language demonstrated enhanced translation accuracy, particularly evident in the North Germanic languages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Exploring Indirect Knowledge Transfer in Multilingual Machine Translation Through Targeted Distillation

Overview

Objectives

Methodology

Teacher Models

Training Datasets

Distillation Process

Model Configurations

Evaluation Metrics

Testing Datasets

Key Findings

Files

README.md

Latest commit

History

README.md

File metadata and controls

Exploring Indirect Knowledge Transfer in Multilingual Machine Translation Through Targeted Distillation

Overview

Objectives

Methodology

Teacher Models

Training Datasets

Distillation Process

Model Configurations

Evaluation Metrics

Testing Datasets

Key Findings