< Back to previous page

Project

Deep Learning for Natural Language Processing

 As the amount of unstructured text data grows dramatically, the need to intelligently process those text data and extract different types of knowledge from it is also remarkably increasing. One of the goals in natural language processing is to develop general and scalable methods that have the ability to jointly solve tasks like extracting information from big unstructured data, sentiment analysis in social networks and grammatical analysis for essay grading, and learning the necessary intermediate representations of the linguistic units involved. However, there are two shortcomings in the standard approaches towards this goal: 1) Simplifying language assumptions: the standard approaches, e.g. bag of words, often force the data into a format compatible with a certain algorithm, during which information like word sequence and potential sentiment interpretations might be lost. 2) Feature representations: constructing a machine system required considerable domain expertise to design a feature extractor that can transform the raw data into a suitable feature representation such as part-of-speech tags, parse tree features. This kind of transformation has taken a long time to develop and integrating them for each new task encumbers the development and runtime of the final algorithm. Furthermore, linguistic inspired feature engineering is not easily portable to languages for which the linguistic resources are not available.          On the other hand, deep learning addresses these two shortcomings. Deep learning methods allow a machine to directly deal with raw data and to automatically discover the representations needed for detection or classification. With multiple levels of representation, deep learning which is often realized with neural networks (e.g. recurrent neural networks) composes simple but non-linear modules that each transforms the representation at one level into a representation at a higher, slightly more abstract level. Deep learning suits especially well for machine perception tasks where the raw underlying features are not individually interpretable, and has been successfully applied in many NLP tasks. My proposed research goal is to develop learning models that can automatically induce representations of human language, in particular its structure and meaning in order to solve multiple higher level language tasks. My work aims at 1) providing innovative principled ways to derive a suitable representation of complex linguistic units such as phrases and sentences; 2) building graph-based biologically inspired deep learning frameworks for NLP; 3) designing representation of symbols; 4) explaining and visualizing deep learning for NLP. Specifically, my future research objectives and planned methodologies are listed as below:       1) New representation forms for high-level linguistic units          Deep learning methods for language processing owe much of their success to neural network language models, in which words and other high-level linguistic units are represented as dense real-valued vectors. An interesting question is whether the meaning of complex linguistic units, i.e., word sequences like phrase, sentence, and paragraph, should be represented in terms of vectors. Intuitively, if a word is represented in the form of a vector, then still representing a sentence or a paragraph in the form of a vector seems not enough to cover their syntactic information. New forms of representation for phrases and sentences, e.g., queue, stack or matrix of vectors, seems more appropriate at the syntactic level.          An interesting method to the above problem is to establish a mapping from words to the basic elements of a graph, such as nodes, edges, and labels so that the high-level linguistic units can be expressed as a graph. The advantage of such techniques is that with graphs, which are naturally built representing connections, the relationships between words themselves as well as between words and high-level linguistic units can be represented in a clear way. Therefore, techniques combine deep learning and graphs can be further used in exploring a new form of high-level linguistic units’ representation.          2) Graph-based biologically inspired deep learning frameworks for NLP       It is well-known that the convolutional neural network (CNN) is a biologically inspired model in Computer Vision, which has been found highly effective and most commonly used in diverse computer vision applications. Conversely, the biologically inspired model for NLP still remains undiscovered. Recently, great efforts have been taken into graph-based deep learning frameworks.      Another interesting question is the graph-level relationships between neurons, including neurons at the same layer and different layers. A deep learning network is essentially a directed graph. Thus graph theories about subgraphs, network flows in a graph and so on, can be used to explore the hidden relationship between neurons. Understanding the relationship between each neuron can be helpful in the design of new deep learning framework.       3) Representation of symbols       Natural language and symbols are intimately correlated. Symbols are used to specify the pause, ending as well as sentiments. Sometimes two sentences with the exact same words but different symbols can reflect two opposite sentiments. However, recent advances in deep learning for NLP seem to contradict the above intuition. Symbols are fading away and not needed anymore during “reasoning” but only survive as input and output of these learning machines. Reasoning with symbols in natural language applications seem to be a relic of an ancient past.    A clearer understanding of the strict link between distributed/distributional representations and symbols will certainly lead to radically new deep learning networks. There has been a survey[16] that draws the link between symbolic representations and distributed/distributional representations. Revitalizing the area of interpreting how symbols are represented inside neural networks helps to devise new deep neural networks that can exploit existing and novel symbolic models of classical natural language processing tasks.       4) Explanation and visualization of deep learning for NLP The visualization of deep learning for Computer Vision has been progressing these years. Take an image for example, the learned features in the first layer of representation typically represent the presence or absence of edges at particular orientations and locations in the image. The second layer typically detects motifs by spotting particular arrangements of edges. The third layer may assemble motifs into larger combinations that correspond to parts of familiar objects, and subsequent layers would detect objects as combinations of these parts. However, that of NLP has not been discovered yet. To visualize each layer in deep learning for NLP surely helps in the understanding of mechanism of deep learning for NLP.

Date:5 Sep 2017 →  5 Sep 2021
Keywords:Deep Learning, Sentence Vector, Recursive Neural Network, Recurrent Neural Netwok, Visualization
Disciplines:Applied mathematics in specific fields, Computer architecture and networks, Distributed computing, Information sciences, Information systems, Programming languages, Scientific computing, Theoretical computer science, Visual computing, Other information and computing sciences
Project type:PhD project