Towards Making the Most of ChatGPT for Machine Translation. (Full report, Findings of EMNLP 2023 accpeted version)
This repository releases the test sets evaluated by ChatGPT API (gpt-3.5-turbo-0301), for the replication of the study.
We evaluate the performance of the models on the Flores-200 and WMT19 Bio and News test sets. The task statistics are shown as follows:
-
ChatGPT's performance largely depends on the temperatures, especially in difficult languages. Generally, setting a lower temperature can result in higher performance.
The relationship between temperature and ChatGPT's performance:
- Emphasizing the task information in prompts can further improve ChatGPT's performance, especially in complex tasks.
Influence of Task-Specific Prompts (TPS) on ChatGPT:
-
Introducing the correct domain information consistently improves ChatGPT's performance while wrong domain information leads to significant degradation in performance.
Influence of Domain-Specific Prompts (DPS) on ChatGPT:
-
When tackling non-English-centric tasks (both the input and expected output are non-English), ChatGPT may generate hallucinations, which should be paid more attention to by the MT/NLP community.
The number of sentences that need to be post-preprocessed in different settings:
-
CoT leads to word-by-word translation behavior, thus bringing significant translation degradation.
The effect of CoT on ChatGPT:
Please refer to our full report for more details.
If you find this work helpful, please consider citing as follows:
@inproceedings{Peng2023ChatGPT4MT,
title={Towards Making the Most of ChatGPT for Machine Translation},
author={Peng, Keqin and Ding, Liang and Zhong, Qihuang and Shen, Li and Liu, Xuebo and Zhang, Min and Ouyang, Yuanxin and Tao, Dacheng},
booktitle={Findings of EMNLP 2023},
url={https://aclanthology.org/2023.findings-emnlp.373},
year={2023}
}