Integrating Fuzzy Matches into Sentence-level Quality Estimation for Neural Machine Translation

Authors

  • Arda Tezcan Universiteit Gent

Abstract

Previous studies show that neural machine translation (NMT) systems produce translations with higher quality when highly similar sentences (i.e. fuzzy matches; FMs) to a given input sentence can be found in the NMT training data. This study explores the usefulness of FMs for the task of sentence-level quality estimation (QE) for NMT. To this end, fuzzy matches are integrated into the QE architecture that utilizes a pre_trained XLM RoBERTa model, through a data augmentation methodology. The results show that FMs improve QE performance in domainspecific scenarios when using translation edit rate (TER) as quality labels. However, similar improvements are not observed when the same methodology is applied to a general-domain setting when quality labels were generated through direct (manual) assessment of translation quality or by measuring the technical post-editing effort required for transforming the MT output to its post-edited version.

Downloads

Published

2022-12-22

How to Cite

Tezcan, A. (2022). Integrating Fuzzy Matches into Sentence-level Quality Estimation for Neural Machine Translation. Computational Linguistics in the Netherlands Journal, 12, 99–123. Retrieved from https://www.clinjournal.org/clinj/article/view/150

Issue

Section

Articles

Most read articles by the same author(s)