< Terug naar vorige pagina

Publicatie

Spatial Representation in Models of Images and Text with Applications to Medical Document Indexing and Autonomous Driving

Boek - Dissertatie

The project addresses the problem of interaction between the autonomous systems and the users in a way where the visual and language information are used to complement and reinforce one another, by learning representations that jointly capture the meaning of language and the visual reality and that allow visual situations to be translated into language and language into visuals. The approach is based on the use of neural networks, or more specifically, multimodal auto-encoders and generative adversarial networks trained on paired visual and textual datasets. The focus lies on the learning and application of multimodal embeddings that can generalize to multiple different tasks and account for both the objects and actions in visual scenes and the lexical content and the grammatical organization of their corresponding language descriptions.
Jaar van publicatie:2022
Toegankelijkheid:Open