Bridging the Image and Text Spaces with Neural Network Methods for Multimodal Representation Learning and Spatial Understanding KU Leuven
In the recent years, the emergence of deep learning models has greatly advanced computer vision and natural language processing (NLP). These models allow producing feature representations for images and words. For example, convolutional neural networks (CNN) and Mikolov’s et al. skip-gram model are popular means to extract such features from images and text corpora respectively. These representations are typically employed either in concept ...