< Back to previous page

Project

Jointly exploring the structural representation in vision and language

During my PhD studies, the research goal is to explore the structural representation in both images/videos and natural languages, especially in the weakly-supervised or unsupervised settings. Specifically, exploring the graph structures in images plays a critical role in visual scene understanding, especially in search engines and image archiving. Moreover, inducing the tree grammars for natural language promotes the development of language understanding, and is widely used in sentiment analysis and dialog systems. Recent research finds the visual cues provide additional regularization in language grammar induction, which further validates the two modalities share some common structures. Motivated by such observation, my PhD topic will focus on jointly parsing the structures in both vision and language and building the cross-modal correspondence in the weakly-supervised or unsupervised settings, expecting to obtain a shared structure for both modalities and apply such structure to other vision-language tasks.

Date:3 Sep 2021 →  Today
Keywords:computer vision, natural language processing, machine learning
Disciplines:Natural language processing, Computer vision, Machine learning and decision making
Project type:PhD project