DeepL VS. ChatGPT: MACHINE TRANSLATION EVALUATION

Authors

  • Yulia Milka Nugraheni Gadjah Mada University
  • Adi Sutrisno

DOI:

https://doi.org/10.36277/jurnalprologue.v10i2.174

Keywords:

Machine Translation Evaluation, Error Analysis, BLEU, DeepL, ChatGPT

Abstract

This research aims to evaluate DeepL and ChatGPT performance in translating acaddemic text, through human and machine evaluation. Furthermore, this research is expected to give readers an overview of the translation produced by DeepL and ChatGPT. DeepL and ChatGPT are two machine translation which are using the latet technology in machine translation called as Natural Language Processing (Jiao et al., 2023). The evaluation was conducted by using Koponen (2010) Error Analysis and Papineni (2002) automated machine translation evaluation called Bilingual Language Evaluation Understudy (BLEU). The evaluation was conducted by applying qualitative and quantitative method. Both methods used in order to draw a stronger conclusion. The result of the research concluded that DeepL evaluation showed a better performance than ChatGPT.  On Error Analysis Evaluation, there are 25 errors found in DeepL translation and 26 errors in ChatGPT translation. On BLEU score evaluation, the final score of DeepL translation is 0.9446657236 and BLEU score of ChatGPT is 0.9211813372.

Downloads

Published

2024-09-30

How to Cite

Nugraheni, Y. M., & Adi Sutrisno. (2024). DeepL VS. ChatGPT: MACHINE TRANSLATION EVALUATION. Prologue: Journal on Language and Literature, 10(2), 411–426. https://doi.org/10.36277/jurnalprologue.v10i2.174