< Back to previous page


Deep learning for multimodal representation learning

There are many tasks that require multimodal representations: Visual question answering, cross-modal retrieval, phrase grounding among many. These domains are bridged together by the need of methods that can project all modalities in a joint and shared latent space. The space should be structured, and should capture the correspondence between the signal that generated the data. In this thesis, we aim to explore and improve the different methods that learn such multimodal representations. In particular, our goal is to improve individual unimodal representations learning methods, find better ways to fuse the unimodal representations into a multimodal representation and develop methods that jointly learn multimodal representations from multiple data streams.

Date:20 Sep 2019  →  Today
Keywords:computer vision, deep learning, natural language processing, machine learning
Disciplines:Computer vision, Natural language processing, Machine learning and decision making
Project type:PhD project