< Terug naar vorige pagina
Publicatie
Spatial Representation in Models of Images and Text with Applications to Medical Document Indexing and Autonomous Driving
Boek - Dissertatie
The project addresses the problem of interaction between the autonomous systems and the users in a way where the visual and language information are used to complement and reinforce one another, by learning representations that jointly capture the meaning of language and the visual reality and that allow visual situations to be translated into language and language into visuals. The approach is based on the use of neural networks, or more specifically, multimodal auto-encoders and generative adversarial networks trained on paired visual and textual datasets. The focus lies on the learning and application of multimodal embeddings that can generalize to multiple different tasks and account for both the objects and actions in visual scenes and the lexical content and the grammatical organization of their corresponding language descriptions.
Jaar van publicatie:2022
Toegankelijkheid:Open