It seems that MT developers are not really in tune with what translators would like to have. Most likely, they aim at translating as much content as possible in as little time as possible, quality becoming a (not very important) variable. Is this really so, or are there also projects devoted to excellent quality?

Share Button

  • Published: 2 weeks ago on 9 January 2018


  1. Aljoscha Burchardt says:

    It is true that some researchers have their own scientific goals that are not immediately matching the needs of translators. One of the reasons for this is that MT research has traditionally focussed on gisting or information translation where the goal is to understand what foreign documents or web pages are about. Public funding especially in the US had a focus on this kind of translation. Translators need something very different, namely something that works like a translation memory, ideally producing perfect translations or almost good translations and not showing trash to the users as this is a waste of their time.

    Another reason is that the prevailing engineering approach to MT research is doing statistical, i.e., data-driven research where it is necessary to get immediate automated feedback about the “quality” during engine development. Unfortunately, evaluating MT quality is as difficult if not more difficult than translation itself. In practice, MT researchers use reference texts and corpora pre-translated by translators to compare them automatically with the MT output. As one can imagine, the simple, surface-based algorithms that perform the comparison cannot really measure quality. They rather measure the “distance” to one good translation, basically how many words or sub-strings match the reference. These measures have shown to somehow correlate with quality as judged by humans, though.

    Another challenge is that the comparisons are only reliably on corpus level, they cannot assess the quality of single segments.

    For quite some time, we have been working on better ways for measuring MT quality such as the MQM framework (see our paper “Towards a Systematic and Human-Informed Paradigm for High-Quality Machine Translation” at, but the “problem” remains that real quality judgements can only be provided by humans.

    In the QT21 project we have developed specific test suites that can be used for semi-automatically assessing quality in a more analytical way.

    One development that increased the interest in better quality measures is the turn from traditional statistical MT to neural MT. As the latter systems output is more divers (“creative”), the standard reference-based measures are no longer fine grained enough to measure improvements. This gives me hope that the quality issue will eventually become more prominent and at the same time the needs of translators.

Leave a Reply