< Back to previous page

Publication

Continual Learning in Neural Networks

Book - Dissertation

Artificial neural networks have exceeded human level performance in accomplishing several individual tasks (e.g. voice recognition, object recognition, and video games). However, such success remains modest compared to human intelligence that can learn and perform an unlimited number of tasks. Humans ability of learning and accumulating knowledge over their lifetime is an essential aspect of their intelligence. In this respect, continual machine learning aims at a higher level of machine intelligence through providing the artificial agents with the ability to learn online from a non-stationary and never-ending stream of data. A key component of such a never-ending learning process is to overcome the catastrophic forgetting of previously seen data, a problem that neural networks are well known to suffer from. The work described in this thesis has been dedicated to the investigation of continual learning and solutions to mitigate the forgetting phenomena in neural networks. To approach the continual learning problem, we first assume a task incremental setting where tasks are received one at a time and data from previous tasks are not stored. We start by developing a system that aims for an expert level performance on each learned task. It reserves a separate specialist model for each task and sequentially learns a gate to forward the input data to the corresponding specialist. We then consider the incremental learning of multiple tasks using a shared model of fixed capacity. For each task, we identify the most informative features and minimize their divergence during the learning of later tasks; using as a proxy the current task data. As an alternative to relying on the current task data, which might be of a very different distribution than previous data, important parameters in a model can be identified and future changes on them get penalized. However, when accounting for an unlimited sequence of tasks, it is impossible to preserve all the previous knowledge. As an adaptive method to specific test conditions, we propose to learn the important parameters at deployment time while the model is active in its test environment. As a result, catastrophic forgetting is overcome but graceful selective forgetting is tolerated. To further account for future tasks, we study the role of sparsity in continual learning. We propose a new regularizer that significantly reduces the percentage of parameters dedicated to each task and as a consequence remarkably improves the continual learning performance. Since the task incremental setting can't be assumed in all continual learning scenarios, we also study the more general online continual setting. We consider an infinite stream of data drawn from a non-stationary distribution with a supervisory or self-supervisory training signal. We first propose a protocol to bring our work on regularizing the important parameters to the online continual learning setting and show an improved learning performance over different streams of data. As to account for more challenging situations where the input distribution is experiencing bigger changes, we explore the use of a fixed buffer of samples selected from the previous history. We propose a sample selection method that makes no assumption on the data generating distribution. To the best of our knowledge, we were the first to tackle the online continual learning problem. The proposed methods in this thesis have tackled important aspects of continual learning. They were evaluated on different benchmarks and over various learning sequences. Advances in the state of the art of continual learning have been shown and challenges for bringing continual learning into application were critically identified.
Publication year:2019
Accessibility:Open