Deep learning for multimodal representation learning KU Leuven
There are many tasks that require multimodal representations: Visual question answering, cross-modal retrieval, phrase grounding among many. These domains are bridged together by the need of methods that can project all modalities in a joint and shared latent space. The space should be structured, and should capture the correspondence between the signal that generated the data. In this thesis, we aim to explore and improve the different ...