< Back to previous page

Project

From Text to Time: Machine Learning Approaches to Temporal Information Extraction from Text

Temporal information extraction is and has been a crucial aspect of automatic language understanding. With the increase in digitization of texts like news papers, but also electronic health records, high quality extraction of temporal information about the described events gives rise to many applications, like question answering, summarization, temporal information retrieval, and automatic timeline visualization.

In this dissertation, we investigate and propose different machine learning approaches for temporal information extraction from text. Our five main contributions show how symbolic knowledge about temporal reasoning and statistical neural network based approaches can be used and combined to improve temporal extraction, in terms of prediction quality, and in terms of coverage. 

First, we construct a document-level structured learning approach for temporal relation extraction, incorporating hard and soft temporal constraints during training and prediction. We show that the document-level constraints can help to improve the quality of the model's predictions, and enforce the predicted temporal relations to be more consistent, important for timeline construction.

Secondly, we design a neural temporal relation extraction model, and investigate how we can efficiently optimize its word representations using unlabeled textual data. Multi-task learning is used as a way to train the representations on two objectives, the supervised relation extraction objective based on the labeled data, and a skip-gram objective based on raw text.

Our third contribution is a literature survey on how temporal reasoning can be successfully and efficiently integrated into temporal information extraction models. The survey highlights multiple gaps in the literature, and provides interesting avenues for future work.

Building on insights from temporal reasoning frameworks, as our fourth contribution, we investigate the construction of timelines of events from text. A new method for relative timeline construction from graphs of temporal relations is proposed. More importantly, a new paradigm is introduced to extract timelines without the need for intermediate temporal relation extraction.

Our last contribution extends the previous work with the extraction of implicit and uncertain temporal information. An annotation scheme is proposed to annotate absolute timelines that can be queried in a probabilistic way, based on the annotated uncertainty, and a set of clinical records is annotated. Moreover, a model is put forward to extract such timelines from text.

The main conclusion of this dissertation is the importance of good integration of symbolic temporal reasoning, key to capturing rigid temporal structure, into statistical (neural) models, crucial when dealing with the ambiguous nature and vagueness present in language. The contributions in this thesis act as a case study and starting point for future research into this integration.

Date:13 Jan 2016 →  13 Jan 2020
Keywords:Natural Language Processing, Information Extraction, Clinical Patient Records, Machine Learning, Temporal Information Extraction, Temporal Reasoning, Neural Networks
Disciplines:Applied mathematics in specific fields, Computer architecture and networks, Distributed computing, Information sciences, Information systems, Programming languages, Scientific computing, Theoretical computer science, Visual computing, Other information and computing sciences
Project type:PhD project