Project

CELL: ContExtual machine Learning of Language translations

Neural machine translation forms a popular and successful approach to translation from a source to target language. Still it copes with problems of wrong translations when language is ambiguous, vague or implicit, or when words were never seen in the training data. Neural network technology offers possibilities to integrate extra-linguistic contextual information in the translation process. In CELL, we aim to design, develop and evaluate multimodal machine translation (MMT) models that integrate visual information obtained from images in the meaning representations created by the neural networks. We will especially investigate attention models that align content in the source and target languages and in case of MMT that align linguistic content with content in the visual data. This contextual attention will help generating more correct translations, to theoretically compare with older statistical machine translation models, and to better explain the neural translation models. The developed technologies will be evaluated on a benchmarking dataset containing images with English captions and their translations, on a dataset of e-commerce products and their multilingual descriptions, and on multilingual subtitles of video documentaries, covering languages such as English, German, French, Czech and Dutch.

Date:1 Oct 2019 → 30 Sep 2023

Keywords:Natural language processing, Machine translation, Multimodal machine translation, Attention mechanisms

Disciplines:Natural language processing

Project

CELL: ContExtual machine Learning of Language translations

Researchers

Project partners

Funding