Human-Computer Interaction in Translation Activity: Fluency of Machine Translation

Cover Page

Cite item


Digitalization is one of the key distinctive features of modern environment and social life. Nowadays more and more functions are transferred to the artificial mind. How effective is the replacement of human activity with computer activity? In the given article, this problem is solved by an example of integration of digital technologies into translation activities. It this paper, emphasis is placed on the quality of machine translation (MT) output of legal texts in the language pair English - Slovak. It studies a Criminal Code formulated in the Slovak language which was translated by a human translator into English and consequently via machine translation system Google Translate (GT) back into Slovak. The back-translation - translation of a translated text back into its original language - as a quality assessment tool to detect discrepancies, mistranslations and inevitable differences between the source text and the target text was used. The quality of MT output was evaluated according to Multidimensional Quality Metrics (MQM) standards with the focus on the dimension of ‘Fluency’. The multiple comparisons were applied to determine which issues (errors) in ‘Fluency’ dimension differ from the others. A statistically significant difference is noticed between ‘Agreement’ and other issues, as well as between ‘Ambiguity’ and other issues. The errors in ‘Agreement’ are related to the differences between the languages: English is considered mostly an analytic language, Slovak represents a synthetic language. The issues in the ‘Ambiguity’ dimension correlate with the type of the text being examined, since legal texts are characterized by relatively complicated wording and numerous terms; moreover, accuracy and unambiguity need to be preserved. Generally, the MT output is able to provide users with basic information about the text. On the other hand, most of the segments need revision and/or correction; in such cases, human intervention and post-editing is necessary.

Full Text

Introduction Since contemporary translation industry is limited by various factors, i.e. fast-moving society, time-saving strategies and financial costs, the use of printed dictionaries is now deemed obsolete. Translators are adopting effective devices such as Computer-Assisted Translation (CAT) tools (e. g. Translation memory software, Linguistic search engines or Terminology management software) or Machine Translation (MT) systems. Machine translation outputs can convey basic information or meaning of the translation; on the contrary, they do not provide a user with flawless texts. To get correct and logically organized texts, MT outputs need to be subsequently post-edited by human. Considering both, human and machine translation, it is necessary to discuss the main principle of translation which is decoding of the meaning of source text into target text. Apart from grammatical - i.e. syntactic and semantic rules characteristic for both languages - translator needs to have a command of exceptions, idiomatic expressions, dialectal and slang words. The situation with machine translation is much complicated. Despite its broad vocabulary, grammatical rules and ability to translate comprehensibly, MT system is still not enough trained to convey the meaning of the text. In order to ensure successful transfer, it is important to evaluate the application of machine translation for particular type of text. Present systems of machine translation are able to provide a user with translated outputs of acceptable quality, e.g. technical documents, sports summaries, weather forecasts, instruction manuals since their vocabulary is limited. According to Munkova and Munk (2016. P. 21), MT systems try to cover all language aspects and to obtain MT output of higher quality. Source text and its translation need to fulfill the same criteria of text. Moreover, the translation needs to meet requirements of equivalence between source and target text. Munkova and Munk (2016. P. 63) claim that the demand of equivalence or concordance differs from secondary translated texts (indirect translations) in source language; equivalence is comprehended as correspondence of source text with text in a different language. A principal factor of quality evaluation is ‘raw’ MT output. They also suppose that in order to gain translation of standard quality, the quality of machine-translated output is given by a number of required corrections. The method directly evaluates translated output from the point of view of the time needed for its revision (post-editing). Melby et al. (2014. P. 279) claim that: [t]he proposed framework consists of a new and universal definition of translation quality, a recently published systematic way to construct the translation specifications that are crucial to this definition, and two types of translation-quality metrics based on this definition and specification system: a rubric approach to assessing the result of a post-editing task, and an error-category approach to assessment of translation quality, whether it is human, machine, or post-edited translation. Related work Svoboda (2015. P. 247) claims that the issue of machine translation can be discussed from the aspect of the process, product, user, purpose, and correspondence with norms. According to Munkova (2013. P. 22), other significant factors influencing a number of corrections of MT output are the following: editing time, readability, comprehensibility, the meaningfulness of MT text and comprehension of source text. To eliminate ambiguous words and polysemy and to simplify sentence structures, translator edits text before translation process (pre-editing), after translation process (post-editing), or during quality checking. Post-editing represents a key part of MT process. In this phase, professional translators or linguists review MT output and correct semantic and grammatical errors in the text. The aim of pre-editing and post-editing is to obtain maximum effectiveness and high quality of MT. Post-editing and machine translation Čulo et al. (2014. P. 201) claim that [a] special type of translation revision is the case of post-editing MT output. Machine translation output is still quite error-prone, and poses very specific problems, as it sometimes may ’hit the nail on the head,’ but in other cases may completely fail the translation of a simple word. The role of post-editing of MT output is the following: - to assure the same meaning in both MT output and human translation; - to achieve the same comprehensibility in both translations - machine and human; - to assure clear punctuation - sentences starting with capital letters and finish with full stops, to capture the meaning with minimum corrections (there is no need to adjust words and word expressions in MT output when they are acceptable). Munkova and Munk (2016. P. 86) assert that the translator’s task is to correct MT output to the level of acceptability in terms of quality. Time of post-editing is given by factors such as working place, concentration, and tolerance of error rate. Prunč (2007. P. 31) states that [t]he most rudimentary MT output created in the early years helped to develop translation as a science because it proved that the transfer between two languages is much more complex than assumed. Post-editing has moved into focus as a more efficient and cost-effective method of translation, while globalization trend reflects an increasing demand for translational services. Post-editing of machine translation (PEMT) is likely to become generally accepted, a separate part of the translational landscape. There are two main types of post-editing distinguished: ‘light post-editing’ and ‘full post-editing’ (with the emphasis on stylistics). Within light post-editing, the aim of a translator is to provide a simple and comprehensible MT output with minimal corrections. This kind of translation is mostly used for personal purposes, and also when quick translation is required. Full post-editing requires more corrections, and it provides translation of a higher quality. Full post-edited MT output is a highly-comprehensible and stylistically neat translated text which can be used for external purposes as it is of publishable quality (TAUS, 2010a, 2010b). There are many studies dealing with the effectiveness of post-editing (Koponen, Salmi, 2015; Aranberri et al., 2014; Guerberof, 2014; Zhechev, 2014; Carl, Kay, 2011; García, 2010; Popović et al., 2014). Many of them prove that post-editing is a generally faster process than human translation. The question is, how time-saving post-editing is in a real situation. Some translation agencies suggest that it saves up to 40 per cent; scientists argue it is only 20 per cent. Nevertheless opinions differ, clients mostly prefer post-editing to human translation due to low costs (Plitt, Masselot, 2010). The rising demand for post-editing services increases the demand for trained post-editors, too. Machine translation represents an effective and cheaper tool when translating a great volume of texts. MT is being adopted and used by more translators and translation agencies which are via MT becoming more effective, quicker and required. Machine translation has no ambitions to substitute human translation; it represents more effective process combining editing, modification, and correction of MT output (Munkova, Munk, 2016; O’Brien et al., 2014). Typological characteristics of English and Slovak language According to Dolník (2013. Pp. 88-89), typology classification is based on common structures of language features regardless their genetic, historic and areal relationships. The features occur in particular languages in various extent. The most common scheme concerning morphological patterns in languages (devised and developed by W. Schlegel, A. Schleicher, W. Humboldt, H. Steinthal, E. Sapir and V. Skalička) is as the following: (1) analytical/isolating type; (2) synthetic type - a) agglutinative and b) flexive; (3) introflexive, and (4) polysynthetic. English and Slovak, the languages examined in the research, can be generally characterized like this: English is mainly considered an analytic language, and Slovak as a synthetic language (also flexive, or flective) (Vaňko, 2015. P. 24). According to Vaňko (2015), grammatical meaning in analytic languages is reflected analytically - i. e. by specific verbs - one carrying a lexical meaning and other auxiliary or grammatical meaning. The English form ‘he did not write’ has its Slovak equivalent ‘nepísal’ and the English form ‘they will not go’ the Slovak form ‘nepôjdu’. A low degree of inflection of main verbs in English correlates with a relatively high number of multiple-word verbs. Analytic languages do not e.g. reflect the differences between nominative and accusative case by forms; word order in such languages is firmly fixed. In English, there is only one way to express the meaning ‘Peter loves Eve’, whereas in Slovak there are two possibilities: ‘Peter ľúbi Evu.’ and ‘Evu ľúbi Peter.’ Suffixes ‘-0‘ (zero, no suffix) and ‘-u‘ of the proper nouns ‘Peter-0’, ‘Ev-u’ indicate which person takes the role of the subject (‘Peter’) and the object (‘Ev-u’). Personal pronouns used in the positions of subjects are obligatory as they indicate grammatical meaning of person (due to the low degree of verb’s inflection): ‘I go’ (English), ´chodím´ not ‘ja chodím’ (Slovak) (Dolník, 2013. P. 92). According to Ondruš and Sabol (1984. P. 186), the Slovak language is preferably characterized by synthetic morphology. It is given by numerous forms and morphemes, or derivational affixes which express different grammatical categories (e. g. gender, number, case) preferably by one formal feature. For example, the morpheme -u in the form ‘žen-u’ (‘woman’) reflects feminine gender, singular number and accusative case. Synthetic languages are characterized by synonymy and homonymy of case affixes. Vaňko (2015. P. 27) explains that grammatical meaning is expressed by inflection, i.e. suffixes (e.g. ‘knih-a’, ‘knih-y’ - ‘a book’, ‘of a book’/‘books’). A suffix can distinguish grammatical meanings of the given word, e.g. the form ‘ruk-e’ - ‘to a hand’ or ‘about a hand’ - as dative or locative case. Further, Slovak is characterized by numerous verb patterns (14) and identical suffixes for expressing person and number in present indicative form by (e. g. suffix ‘-m’: číta-m, robí-m - ‘I am reading, I am making’). For more details see Welnitzová (2020). Legal texts Administrative style is distinguishable in the texts of official communication, exemplified in the Criminal Code of the National Council of the Slovak Republic. The text and its translations are discussed in the first part of our research. Based on Koller´s theory of translation based on adequacy, equivalency, and linguistic approach, legal texts belong to a group of factual texts. The function of the source text being studied is preferably informative; it aims to transfer information between professionals in the same field (Ďuricová, 2013. P. 34) Schneiderová (2013. P. 97) claims that the translation of legal texts belongs to the oldest and most important translations worldwide. Such texts are characterized by specific wording, and terminology. In translation, the most important task is to convey the meaning from source language into target language with the emphasis on the text as a whole. Gromová and Müglová (2005. Pp. 90-91) suggest that in translation of legal texts, translators need to take into consideration the concept of unified legal system. When legal terminology of target language does not provide translator with an adequate term, they search for other translation solutions, mostly word-to-word equivalents. The examined text has primarily informative function, i.e. it highlights content and information. Müglová (2009. P. 218) claims that texts with predominantly informative function are structured at semo-syntactic level. The aim of the translation is to convey overall and complete content, considering dominant standards of reader’s culture (Azizi et al., 2020; Khonamri et al., 2020). Some translatologists suppose that the texts of laws and agreements do not fully apply to informative function (Ďuricová, 2013). Newmark (1982. Pp. 13-15) considers regulations and laws conative, i. e. vocative texts. In terms of legal language, there are two functions of language: informative and regulatory, i.e. descriptive and prescriptive in terms of legal terminology. Laws, codes, agreements, and legal regulations are legal texts with predominantly prescriptive function (Bocquet, 1994. P. 2). Back translation Back translation - translation of a translated text back into its original language - has been used primarily as a translation quality assessment tool and standard translation procedure (Dept et al., 2017; Harkness et al., 2010). It has been widely used to allow researchers to make inferences about the quality of the translation (Brislin, 1970, 1984) or to show the extent of equivalency between the source text and the target text (Chidlow et al., 2014). Its purpose is to evaluate the quality of translation by comparing the back translation with the source text (Harkness, Schoua-Glusberg, 1998). Son (2018) suggests to use back translation not as a translation quality assessment tool, but as a documentation tool. Back translation can then support explanatory prose justifying translation decisions and show the differences between the source text and the translation or between the different translated versions of the same text. The approach is not intended to check the quality of the translation but instead to enhance the documentation of translation decisions, and also to promote harmonizing translations between languages. Translation quality assessment and machine translation According to Munkova and Munk (2016. P. 75), evaluation of the quality of machine translation regards both the quality of machine translation and user’s satisfaction. Nowadays, there are various models and approaches to MT evaluation. The most popular is White’s model (2003) which focuses on feasibility, internal evaluation, declarative evaluation, usability, operational evaluation, and comparison. Melby et al. (2014. P. 287) introduces two basic approaches: ‘error-count approach’ (or analytical approach since it expresses quantity and frequency in percentage) working on the determination of errors and quantity, and ‘rubric approach’ in which post-editors evaluate translation on the scale 1-5 (1 meaning that translation does not meet requirements; 5 meaning it meets requirements in all categories). Multidimensional quality metrics and machine translation Since there is no special error typology for evaluating and measuring translation quality for languages like Slovak (synthethic language with many inflectional morphemes), and other existing evaluation methods do not provide a complex error typology according to which adequate evaluation could be carried out, we decided to use a general framework MQM (Multidimensional Quality Metrics). MQM is a framework for the description of translation quality, regarding the aspect of logic and coherence of the text. Although MQM is used to evaluate various aspects of translation, in our paper we will study and discuss the dimension (category) of ‘Fluency’ in more details. MQM framework was proposed by the German Research Center for Artificial Intelligence (DFKI) in 2015 ( It defines the quality as the 'adherence of the text to appropriate specifications'. The quality of translation reflects the accuracy and fluency of the text designated for users for their specific purposes. Specifications are characterized as a 'description of the requirements for the translation' (MQM, 2016). MQM helps to achieve fast, high-quality and holistic evaluation with a focus on text as a whole. At the same time, it analyses specific issues of a particular text. It can be used for automatic or manual evaluation of any type of text. The German institution dealing with evaluation of translation's quality claims that MQM 'allows quality to be evaluated along multiple dimensions, allowing you to identify specific problems and understand the strengths and weaknesses of specific translations' (MQM, 2016). The given metrics do not track the mentioned dimensions in each type of text; it focuses on a particular, the most distinctive dimension of the text. In a case of different translations of one source text (original), it provides an overview of specific features and qualities of each translation. By their comparison and evaluation, an evaluator can choose the most adequate solution for a particular situation. MQM represents a descriptive list discussing more than 100 metrics to assess the quality of translation and to identify specific issues in given texts. The issues are hierarchically structured, proceeding from marginal to the detailed ones. The framework is a wide-ranging scheme covering clear and unambiguous translation issues like e.g. ‘Mistranslation’, or more complex ones, e.g. ‘Design’. Obviously, the most intricate issue is the adequacy of translation from source language into the target language, regarding cultural specifics. As all the issues are defined clearly, evaluator is able to define the quality of translation with its specifics in objective and clear way. For these purposes, the standards applying translation specifications according to ASTM F2575 can be helpful (Standard Guide for Quality Assurance in Translation). The metrics represent the issues which can be found in the text. The eight main dimensions of MQM framework are the following: ‘Accuracy, Design, Fluency, Internationalization, Locale convention, Style, Terminology, and Verity’. The dimension ‘Other’ covers the issues which cannot be classified in other dimensions of MQM; the dimension ‘Compatibility’ includes issues taken from legacy metrics that are not considered appropriate for general use in MQM. The most addressed dimensions of metrics are ‘Accuracy’ and ‘Fluency’. The main dimensions are further divided into several sub-dimensions, thus the MQM core contains 20 categories covering the most frequent issues. The dimensions ‘Accuracy, Fluency, and Verity’ are the most wide-ranging. In our study, we deal with the dimension of ‘Fluency’ with its sub-dimensions, as designed in MQM. MQM defines ‘Fluency’ as the category which ‘includes those issues about the linguistic 'well-formedness' of the text that can be assessed without regard to whether the text is a translation or not. Most Fluency issues apply equally to source and target texts’ (MQM, 2016). Fluency is highly dependant on the grammar of the language and since Slovak is considered a synthetic language (with numerous morphemes and inflections), fluency affects the comprehensibility of the text most. The dimension ‘Fluency’ includes the sub-dimensions of ‘grammar, grammatical register, inconsistency, spelling, typography, and unintelligible’. Research objective The idea of back-translation in statistical machine translation appears in various contexts, using it for semi-supervised learning (Bojar, Tamchyna, 2011), or in self-training (Goutte et al., 2009). Generally, back-translation approach still improves translation accuracy in all language pairs with a low-resource setting (Hoang et at., 2018. P. 18). The aim of the research is to examine the quality of machine translation (MT) output of legal texts (a Criminal Code formulated in the Slovak language) which was translated by a human translator into English and consequently via machine translation system Google Translate (GT) back into Slovak. Using back-translation method (translation of a translated text back into its original language), we carried out the assessment based on Multidimensional Quality Metrics MQM framework to detect discrepancies, mistranslations and inevitable differences between the source text and the target text. We evaluated the quality of MT output according to Multidimensional Quality Metrics (MQM) standards with the focus on the dimension of ‘Fluency’. Following the aim of the research, we defined the assumptions: 1) we assume that the occurrence of the issues defined in MQM correlates with the style (type) of studied text. It would be reflected in the occurrence of particular issues, and some issues may not even occur in the examined text; 2) we assume a certain extent of relationship among the ‘Fluency’ issues. On the other hand, we assume there are statistically significant differences in the occurrence of ‘Fluency’ issues in text of administrative style; 3) we assume that the number of ‘Fluency’ issues (according to MQM) correlates with the number of words in MT output in particular segments; 4) we assume that the number of ‘Fluency’ issues (according to MQM) correlates with the number of final corrections (edited times) of particular segments. In other words, using correlation analysis and multiple comparisons, we analyze the occurrence and relationships of error in the category of ‘Fluency’ defined by MQM. Methods Our research was carried out on the principals of back-translation; it means the original document was translated by a human translator from Slovak into English and consequently translated by Google Translate (commonly used MT system) from English into Slovak (ST_SK=>HT_EN=>MT_SK). We examined a legal text, containing relatively complicated and structured wording with numerous terms. The source text represented the ’Criminal Code’ in the Slovak language (the extent of 16 standard pages), proposed by the National Council of the Slovak Republic and translated by a human translator (the certified translator Mária Ďurčová) into English language. Since the original of ‘Criminal Code of the National Council of the Slovak Republic’ established for Slovak legal system is the official document written in the mother tongue, we considered this text flawless and natural. In the MT output, we identified the issues related to ‘Fluency’ dimension of MQM: 1. ambiguity, 2. character-encoding, 3. coherence, 4. cohesion, 5. corpus-conformance, 6. duplication, 7. grammar, 8. grammatical-register, 9. inconsistency, 10. index-toc, 11. broken-link, 12. nonallowed-characters, 13. offensive, 14. pattern-problem, 15. sorting, 16. spelling, 17. typography, and 18. unintelligible. Due to the character of the examined text, we assumed that some errors would not be identified in the text. After the identification and analysis of errors (sample see in Table 1), we evaluated the MT output. We calculated the frequency of the errors and numbered the segments which needed post-editing. Consequently, we post-edited the MT output segment by segment (1 sentence representing 1 segment). Then we compared the MT output with the human translation (HT_EN), and in the incomprehensible segments with the original ST_SK text. The MT output was post-edited in a virtual environment OSTEPERE (a system for translation, post-editing and evaluation of machine translation (Munková et al., 2016; Benko and Munková, 2016), in which post-editing, classification of errors, even keyboard time, thinking time and edited-time was recorded. Table 1 Scheme of the identification of errors of MT output (examples) ID ST_SK HT_EN MT_SK Type of error 11 PRVÁ ČASŤ PART ONE ČASŤ PRVÁ Word-order 12 VŠEOBECNÁ ČASŤ GENERAL PART GENERAL PART Untranslated 13 § 1 Section 1 Sekcia 1 Terminology 14 Predmet zákona Purpose of the Act účel zákona Terminology capitalization Note: ST_SK (original text for translation in Slovak), HT_EN (human translation from Slovak into English), MT_SK (machine translation from English into Slovak). To test the differences between the dependent samples (‘Fluency’ MQM errors) and to determine the degree of agreement, we used nonparametric methods - Kendall coefficient of agreement and Cochrane Q test, since the examined variables are binary. For multiple comparisons, we used parametric but sufficient Tukey's HSD test, in which the average error rate represents the proportion of errors (relative error rate), given the binary character of the examined variables. When discussing the error rate of individual spheres of errors, we used the interpretation of dependence rate according to Cohen (1988) <10 meaning trivial incidence, 10-30 low, 30-50 medium, 55-70 high, > 70 very high incidence. Results After the identification of the errors, we found out that some issues from ‘Fluency’ dimension of MQM were not identified, thus we did not consider them in the further analysis: 5. corpus conformance, 9.1 inconsistent abbreviations, 9.2 images vs. text, 9.3 inconsistent link, 9.4 external inconsistency, 10.1 index/toc format, 10.2 missing/incorrect toc item, 10.3 page references, 11.1 document external link, 11.2 document internal link, 12. nonallowed characters, 13. offensive, 14. pattern problem, 15. sorting, 16.2 diacritics, 17.2 unpaired marks and 17.3 whitespaces. To determine the extent of the significance of ‘Fluency’ issues occurrence, some variables needed to be transformed (binarized) to 0/1. To meet the criteria of Cochran´s Q test, we transformed the issues with occurrence more than one to one (0/1) in particular segment; e.g. the issue ‘coherence’ could occur only once in a segment. The issues such as ‘function words’ could occur more than once in a segment. Such errors were transformed into 1 (0/1). We restricted the occurrence of errors to 0/1, and we did not record the frequency of the given issue in the segment, i. e. the issues were represented by binary quantity. We stated the null hypothesis: H0: ‘There are no statistically significant differences in occurrence among ‘Fluency’ issues in the text of administrative style’. Based on the results of Cochran’s Q Test (Q = 476.3583, df = 14, p < 0.000000), we reject the null hypothesis at a 99.9% level of significance, i. e. there are statistically significant differences in the occurrence of the given ‘Fluency’ issues. After the null hypothesis was rejected, we were interested in statistically significant differences among categorized errors. Based on multiple comparisons (Tukey HSD test), we identified 7 homogeneous groups, i. e. groups of issues with approximately equal (statistically no significant differences) proportion of occurrence of ‘Fluency’ issues in the examined text (Table 2). Table 2 Visualization of the homogeneous groups, arranged from the minimal to the maximum occurrence Fluency issues (MQM) Mean Numbers of homogeneous groups 1 2 3 4 5 6 7 8. grammatical-register 0.005102 **** 2. character-encoding 0.025510 **** **** 6. duplication (0/1) 0.035714 **** **** 7.2.2. part-of-speech (0/1) 0.040816 **** **** **** 7.2.3. tense-mood-aspect (0/1) 0.112245 **** **** **** **** 7.3. word-order 0.137755 **** **** **** **** 1.1 unclear-reference 0.158163 **** **** **** 7.1. function-words (0/1) 0.229592 **** **** **** 17.1. punctuation (0/1) 0.244898 **** **** 18. unintelligible 0.336735 **** 3. coherence 0.336735 **** 4. cohesion 0.336735 **** 16.1. capitalization (0/1) 0.336735 **** 7.2.1 agreement (0/1) 0.464286 **** 1. ambiguity (0/1) 0.489796 **** The minimal occurrence was noticed in the issue 8. to the issue 1.1: grammatical-register, character-encoding, duplication, part-of-speech, tense-mood-aspect, word-order, and unclear-reference - they were related to maximum 16% of sentences; the highest occurrence (more than 20%) was identified in issue 7.1 to the issue 1.: function-words, punctuation, unintelligible, coherence, cohesion, capitalization, agreement (46.4%) and ambiguity (49%). A statistically significant difference was noticed between ‘Agreement’ and other issues, as well as between ‘Ambiguity’ and other issues. The most frequent issue was ‘Ambiguity’ (in 96 from 196 segments). This type of issue is related to both source and target text. ‘Ambiguity’ in the process of translation needs to be replaced by the issue ‘ambiguous-translation’, which is related to terminology and standard language typical for a given type of the text. The second most frequently occurring issue was ‘Agreement’ (in 91 from 196 segments); ‘Agreement’ and ‘Ambiguity’. Mostly, the errors in the category of ‘Agreement’ refer to the issues of inflection in the Slovak language: in the segment 11, the issue ‘Agreement’ was identified 6 times, in some segments, it occurred just once; maximum occurrence was 21 times in one segment. Since the variables (‘Fluency’ issues) have no normal distribution, for the last two assumptions we used nonparametric correlations - Kendall Tau Correlations. We tested the null hypothesis H0: ‘Occurrence of ‘Fluency’ issues does not correlate with the number of words in MT output in the given segment’. Since the transformation was not needed in this case, we used the original simple frequency of occurrence of the issues found in the segments. Table 3 Correlation analysis results between Fluency issues & number of words in MT Fluency issues Valid N Kendall Tau Z p-value 1. ambiguity 196 0.329420 6.85656 0.000000 1.1 unclear-reference 196 0.186820 3.88849 0.000101 2. character-encoding 196 -0.043948 -0.91473 0.360331 3. coherence 196 0.504738 10.50563 0.000000 4. cohesion 196 0.507195 10.55677 0.000000 6. duplication 196 0.189394 3.94206 0.000081 7.1. function-words 196 0.349012 7.26435 0.000000 7.2.1 agreement 196 0.614185 12.78366 0.000000 7.2.2. part-of-speech 196 0.152336 3.17072 0.001521 7.2.3. tense-mood-aspect 196 0.300947 6.26392 0.000000 7.3. word-order 196 0.295293 6.14624 0.000000 8. grammatical-register 196 0.102148 2.12610 0.033495 16.1. capitalization 196 -0.378451 -7.87710 0.000000 17.1. punctuation 196 0.445918 9.28135 0.000000 18. unintelligible 196 0.476232 9.91232 0.000000 Except for ‘character-encoding’ issue (coefficient Kendall Tau is statistically insignificant), all identified ‘Fluency’ issues correlate with the number of words in MT output in the given segment (Table 3). This dimension does not contain any other sub-dimensions and it is related to both source and target texts. Characters are garbled due to incorrect coding. The issue ‘Capitalization’ also associates with the number of words in MT output, but the values are changed together in the opposite direction, i.e. with an increasing number of words, the occurrence of ‘Capitalization’ is decreasing. This seems to be quite natural for the type of the examined text (‘Criminal Code’), where names of sections and definitions of terms consisting of one or more words (maximum 3) are capitalized. Similarly, the last null hypothesis was tested H0: ‘The occurrence of ‘Fluency’ issues does not correlate with the number of final corrections (edited times) in a segment’. The results (Table 4) show that the issues ‘Ambiguity, Coherence, Cohesion, Agreement, Tense-mood-aspect, and Unintelligible’ do not correlate with the number of corrections. Some segments (e. g. segment 88 or 89) were post-edited by the post-editor again. The issue ‘Coherence’ has no sub-dimensions, and it is related to both texts: source text and target text. It reflects the relationship between two and more semantic features in the text, where one feature anticipates another feature and their interpretation is mutually dependent. ‘Cohesion’ is connected with ‘Coherence’. ‘Cohesion’ applies to incorrect or missing elements which are needed for the intended meaning of the text, ‘Coherence’ considers the text as a whole. The dimension ‘Unintelligible’ represents an issue in both source and target language, and it is often connected with spelling and grammar. Table 4 Correlation analysis results between Fluency issues & of final corrections (edited times) Fluency issues Valid N Kendall Tau Z p-value 1. ambiguity 196 0.152155 3.16696 0.001540 1.1 unclear-reference 196 0.075037 1.56182 0.118331 2. character-encoding 196 -0.048238 -1.00403 0.315362 3. coherence 196 0.181856 3.78517 0.000154 4. cohesion & edited-times 196 0.221285 4.60584 0.000004 6. duplication & edited-times 196 0.042425 0.88303 0.377220 7.1. function-words 196 0.057085 1.18816 0.234768 7.2.1 agreement 196 0.199313 4.14852 0.000033 7.2.2. part-of-speech 196 0.035882 0.74684 0.455159 7.2.3. tense-mood-aspect 196 0.252333 5.25208 0.000000 7.3. word-order & edited-times 196 -0.011034 -0.22967 0.818352 8. grammatical-register 196 -0.021350 -0.44439 0.656762 16.1. capitalization 196 -0.084354 -1.75574 0.079132 17.1. punctuation 196 0.054714 1.13883 0.254776 18. unintelligible 196 0.181856 3.78517 0.000154 Discussion and Conclusion The given article introduces the issue of machine translation, a relatively new topic in our academic and professional environment. Its task was to present the most serious errors of machine translation in the direction English - Slovak language, based on typological characteristics of languages according to MQM. Using correlation analysis and multiple comparison, we aimed to analyze the occurrence and relationships of error categories in the sphere of ‘Fluency’ defined by MQM. We found out that 17 sub-dimensions of MQM were not apllicable in the examined text. They are mostly connected with additional text information (e. g. web links and references) which is not typical for legal texts. We can generally state that the MT output is comprehensible and a reader can get the meaning of the text. However, most of the segments require revisions since legal texts are characterized by unambiguity and accuracy. The discrepancies between the human translation and the MT output were caused by inadequate machine translation transfer; the errors mainly concern inflection, conjunction, and terminology. Comparing the ST_SK text (source text in Slovak) and the MT_SK text (MT output in the source language) we can say that in the MT_SK text, there were numerous errors in the category of ‘Fluency’. The most errors were related to the dimension of ‘Agreement’ (237 errors in 91 out of 196 segments) which were highly related to the incorrect use of suffixes. They primarily did not affect the meaning of the words but instead overall comprehensibility of the output. The post-editing process primarily concerned the correction of errors related to grammar. Basically, Google Translate managed to convey the basic meaning of the source text into the MT output with relatively significant differences, ambiguities and difficulties when tracking logical connections in the text. Thus, ‘Ambiguity’ (98 errors in 96 segments) is considered the most frequent issue in the MT output. The incorrect terms, vague references or expressions, and unclear contexts, needed human intervention. Since legal texts are characterized by unambiguity and accuracy, post-editing of the given text from both points of view would be needed. Otherwise, incorrect translation could lead to misunderstanding of the text. For example, the segment of the ST_SK text containing the term ‘zákon’ (law) was translated as ‘law’ by the human translator, but as ‘zákonník’ (code) by Google Translate. According to Cambridge online dictionary, ‘law’ is ‘a rule made by a government that states how people may and may not behave in society and in business, and that oftern orders particular punishments if they do not obey, or a system of such rules‘ (e.g. civil/criminal law), ‘code’ is ‘a set of rules and laws‘ (e.g. the state's legal code). The shift in the meaning (law - code) would cause misunderstanding of the term and an error in accuracy. The MT´s suggestion ‘trest odňatia pokutu’ in segment 77 represents another incorrect term translation. First it seems that it refers to the term ‘trest odňatia slobody’ (custodial penalty). The incorrect translation is used in four cases (out of six), the correct term ‘trest odňatia slobody’ is used in two translations. It is obvious that these segments need to be post-edited; otherwise, the correct translation can lead to misunderstanding of the traslation. A similar problem can be noticed in the segment 96 ‘dôstojník Zboru väzenskej a justičnej stráže SR’ (an officer of the Corps of Prison and Court Guard of the Slovak Republic). The source text refers to the ‘príslušník Zboru väzenskej a justičnej stráže SR’ (a member of the Corps of Prison and Court Guard of the Slovak Republic). As there is a significant difference between the term ‘officer’ and ‘member’ in this context; we can identify a serious terminology error which is being repeated in the whole MT output. Nowadays, machine translation represents an effective tool for many translators and translation agencies. The reasons are numerous: the amount of translations to be translated, computers are more consistent in translating terms than human translators, reduction of costs for translations, or good enough quality of MT outputs. After examining the legal text in our research we can state that alhough the comprehensibility of the MT output was adequate, it numbered errors in the category of ‘Agreement’ and ‘Ambiguity’. If the output aims to be applicable in practice, it would need a detailed post-editing in terms of the mentioned categories and, of course, in terms of terminology and accuracy.


About the authors

Katarina Welnitzova

Constantine the Philosopher University in Nitra

Author for correspondence.

assistant of Translation Studies at the Department of Translation Studies

67 Stefanikova St, Nitra, 949 01, Slovak Republic

Barbara Jakubickova

Constantine the Philosopher University in Nitra


PhD student at the Department of Translation Studies

67 Stefanikova St, Nitra, 949 01, Slovak Republic

Roman Králik

Kierkegaard Society in Slovakia; Kazan (Volga region) Federal University


PhD., Professor of Philosophy, is President of Kierkegaard Society in Slovakia and Central European Research Institute of Soren Kierkegaard (Sala, Slovakia); senior research fellow at Scientific and Educational Center of Pedagogical Researches at the Kazan (Volga region) Federal University (Kazan, Russia).

18 Hurbanova St, Sala, 92701, Slovak Republic; 18 Kremlyovskaya St, Kazan, 420008, Russian Federation


  1. Absolon, J., Munková, D., & Welnitzová, K. (2018). Machine Translation: Translation of the Future? Machine Translation in the Context of the Slovak Language. Praha: Verbum.
  2. Aranberri, N., Labaka, G., Arantza Díaz De, I., et al. (2014). Comparison of post-editing productivity between professional translators and lay users. Proceedings of the Third Workshop on Post-Editing Technology and Practice (WPTP-3), 20-33.
  3. Azizi, M., Tkacova, H., Pavlikova, M., & Jenisova, Z. (2020). Extensive Reading and the Writing Ability of EFL Learners: The Effect of Group Work. European Journal of Contemporary Education, 9(4), 726-739.
  4. Bocquet, C. (1994). Pour une méthode de traduction juridique. prilly: cb service.
  5. Bojar, O. & Tmachyna, A. (2011). Improving translation model by monolingual data. Proceedings of the Sixth Workshop on Statistical Machine Translation, 330-336.
  6. Brislin, R.W. (1970). Back-Translation for Cross-Cultural Research. In Journal of Cross-Cultural Psychology, 1(3), 185-216.
  7. Brislin, R.W. (1986). The Wording and Translation of Research Instruments. In W.L. Lonner, & J.W. Berry (Eds.), Field Methods in Cross-Cultural Research (pp. 137-164). Newbury Park, CA: Sage.
  8. Carl, M., & Kay, M. (2012). Gazing and Typing Activities during Translation: A comparative study of translation units of professional and student translators. Meta, 56(4), 89-111.
  9. Čulo, O., Gutermuth, S., Hansen-Schirra, S., et al. (2014). The Influence of Post-Editing on Translation Strategies. In Sh. O’Brien et al. (Eds.), Post-editing of Machine Translation. Processes and Applications (pp. 200-218). Newcastle upon Tyne: Cambridge Scholars Publishing.
  10. Dept, S., Ferrari, A., & Halleux, B. (2017). Translation and cultural appropriateness of survey material in large-scale assessments. In P. Lietz, J.C. Cresswell, K.F. Rust & R.J. Adams (Eds.), Implementation of Large-Scale Education Assessments, 153-172. Hoboken, NJ: John Wiley & Sons.
  11. Dolník, J. (2013). Všeobecná Jazykoveda. Opis a Vysvetľovanie Jazyka. Bratislava: Veda.
  12. Ďuricová, A. (2013). Typológia právnych textov justičných orgánov. Od textu k prekladu VIII. Praha: Jednota tlumočníků a překladatelů.
  13. García, I. (2010). Is machine translation ready yet? Target, 22(1), 7-21.
  14. Goutte, C., Cancedda, N., Dymetman, M., & Foster, G. (2009). Learing Machine Translation. The MIT Press.
  15. Gromová, E., & Müglová, D. (2005). Kultúra - interkulturalita - translácia. Nitra: Univerzita Konštantína Filozofa v Nitre.
  16. Guerberof, A. (2014). Correlations between productivity and quality when post-editing in a professional context. Machine translation, 28(3-4), 165-186.
  17. Harkness, J., & Schoua-Glusberg, A. (1998). Questionnaires in translation. In J. Harkness (Ed.), Cross-Cultural Survey Equivalence (pp. 87-126). Manheim: Zuma-nachrichten spezial 3.
  18. Harkness, J., Villar, A., & Edwards, B. (2010). Translation, adaptation, and design. In J.A. Harkness, M. Braun, B. Edwards, T.P. Johnson, L. Lyberg, P.P. Mohler & T.W. Smith (Eds.), Wiley series in survey methodology. Survey methods in multinational, multiregional, and multicultural contexts (pp. 115-140). Hoboken, NJ: John Wiley & Sons.
  19. Hoang, C.D., Koehn, P., Haffari, G., & Cohn, T. (2018). Iterative Back-Translation for Neural Machine Translation. Proceedings of th 2nd Workshop on Neural Machine Translation and Generation, 18-24.
  20. Chidlow, A., Plakoyiannaki, E., & Welch, C. (2014). Translation in cross-language international business research: beyond equivalence. Journal of International Business Studies, 45(5), 562-582.
  21. Khonamri, F., Ahmadi, F., Pavlikova, M., & Petrikovicova, L. (2020). The Effect of Awareness Raising and Explicit Collocation Instruction on Writing Fluency of EFL Learners European. Journal of Contemporary Education, 9(4), 786-806.
  22. Koponen, M., & Salmi, L. (2015). On the correctness of machine translation: A machine translation post-editing task. Journal of Specialised Translation, 23, 118-136.
  23. Melby, A., Fields, P.J., & Housley, J. (2014). Assessment of Post-Editing via Structured Translation Specifications. In Sharon O’Brien et al. (Eds.), Post-editing of Machine Translation. Processes and Applications (pp. 274-299). Newcastle upon Tyne: Cambridge Scholars Publishing.
  24. MQM. (2016). Multidimensional Quality Metrics (MQM) Definition. Retrieved July, 2019, from
  25. Müglová, D. (2009). Komunikácia Tlmočenie Preklad alebo Prečo spadla Babylonská veža? Nitra: ENIGMA.
  26. Munkova, D. (2013). Prístupy k strojovému prekladu (modely, metódy a problémy strojového prekladu). Nitra: Univerzita Konštantína Filozofa v Nitre.
  27. Munkova, D., & Munk, M. (2016). Evalvácia strojového prekladu. Nitra: Univerzita Konštantína Filozofa v Nitre.
  28. Newmark, P. (1982). Approaches to Translation. Oxford: Pergamon Press Ltd.
  29. O’Brien, S., Balling, L. W., et al. (2014). Post-editing of Machine Translation: Processes and Application. Cambridge Scholars Publishing, Newcastle upon Tyne.
  30. Ondruš, S., & Sabol, J. (1984). Úvod do štúdia jazykov. Bratislava: SPN.
  31. Plitt, M., & Masselot, F. (2010). A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context. The Prague Bulletin of Mathematical Linguistics, 93, 7-16.
  32. Popović, M., Lommel, A., Burchardt, A., et al. (2014). Relations between different types of post-editing operations, cognitive effort and temporal effort. Proceedings of the 17th Annual Conference of the European Association for Machine Translation, EAMT 2014, 191-198.
  33. Prunč, E. (2007). Entwicklungslinien der Translationwissenschaft: Von den Asymmetrien der Sprachen du den Asymmetrien der Macht. Verlag: Frank & Timme.
  34. Schneiderová, A. (2013). Klasifikácia právnych textov a problematika ich prekladu. Od textu k prekladu VIII. Praha: Jednota tlumočníků a překladatelů.
  35. Son, J. (2018). Back translation as a documentation tool. The International Journal for Translation and Interpreting, 10(2), 89-100.
  36. Svoboda, T. (2015). Hodnocení kvality strojového překladu. Kvalita a hodnocení překladu: Modely a aplikace. Olomouc: Univerzita Palackého v Olomouci.
  37. TAUS. (2010a). Machine Translation Post-editing Guidelines. Technical report. Retrieved July, 2019, from
  38. TAUS. (2010b). Post-editing in Practice. A TAUS Report. Technical report. Retrieved July, 2019, from
  39. Vanko, J., & Auxova, D. (2015). Morfológia slovenského jazyka. Nitra: UKF.
  40. Welnitzová, K. (2020). Chybovosť v predikatívnosti a kvalita strojového prekladu. Jazyk a Kultúra, 11(41-42), 160-172.
  41. White, J.S. (2003). How to evaluate machine translation. In H. Somers (Ed.), Computers and Translation: A translator's guide, 211-244.
  42. Zehnalová, J., Chromá, M., et al. (Eds.). (2015). Kvalita a hodnocení překladu: Modely a aplikace. Olomouc: Univerzita Palackého v Olomouci.
  43. Zhechev, V. (2014). Analysing the Post-Editing of Machine Translation at Autodesk. In Sh. O’Brien et al. (Eds.), Post-Editing of Machine Translation. Processes and Applications (pp. 2-13). Newcastle upon Tyne: Cambridge Scholars Publishing.

Copyright (c) 2021 Welnitzova K., Jakubickova B., Králik R.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies