< Terug naar vorige pagina

Publicatie

Uncertainty Estimation in Machine Learning: Applications in Neural Network Compression and Calibration

Boek - Dissertatie

In scientific fields where plausible reasoning is needed, it is of critical importance to consider uncertainties in the process of conducting such deductive reasoning. A typical pipeline of this process contains three steps: (1) collect data, (2) build a computational model and (3) make predictions. For example, in a system of weather forecasting, it makes little sense to say tomorrow will rain or not in step (3) because the atmosphere is essentially uncertain. Instead, the prediction always has the form of ``tomorrow will rain with probability $x\%$''. The collected data in step (1) is also endowed with uncertainties, since measurement tools are not perfect. Uncertainties could also appear in step (2) if the model has a probabilistic nature. Uncertainties appeared in these three steps are interconnected. Far too confident predictions will be made if these uncertainties are ignored. In order to conduct better deductive reasoning, it is essential to estimate uncertainties. In this thesis, we present novel methods to estimate uncertainties appeared in all three steps. Uncertainty estimation itself is not the ultimate goal in the reasoning process. More importantly, we develop methods to incorporate these uncertainties into machine learning models. For each type of uncertainty, we study theoretical properties of the proposed methods and demonstrate their effectiveness through a series of practical applications. \Cref{ch:compression} describes an approach to estimate observations' uncertainties. We also show how to incorporate these uncertainties into a Gaussian Process (GP), which is the model class we consider in this chapter. To demonstrate our proposed method, we apply it to compress neural network models. In the problem of model compression, it is essential to obtain the desired trade-off between the size and the performance of the compressed model by tuning compression parameters. Rather than measuring the performance of the compressed model using a full validation set, we show it is sufficient to use a small subset of this full set. Under this framework, observations' uncertainties can be estimated using the class of U-statistics. Our method is demonstrated on VGG and ResNet models, and the resulting system can find optimal compression parameters in a matter of minutes on a standard desktop machine, orders of magnitude faster than competing methods. \Cref{ch:categorical} presents a novel covariance function for a GP in conditional parameter spaces. The model uncertainty of a GP is captured by its associating covariance function. This specific covariance function is developed by considering the structural information of the conditional parameter space considered in this chapter. The resulting GP model is highly sample-efficient, especially in low-data regime. In such a conditional parameter space, it is expected the developed method is particularly valuable when one configuration is expensive, but correlates with another one that is cheap. In such case, inference can be made about expensive configuration using those cheap observations. Such a property is closely related to the so-called \textit{multi-fidelity optimization}. To demonstrate the effectiveness of our proposed method, we evaluate our proposed method on a series of problems, include an optimization benchmark function, a neural network compression problem, pruning pre-trained large-scale neural network models as well as searching activation functions of ResNet20. Experimental results show our approach significantly outperforms the current state of the art for conditional parameter optimization. In addition, we provide rigorous theoretical analysis regarding our proposed method. \Cref{ch:calibration} studies the problem of post-hoc calibration of neural network classifiers. The purpose of post-hoc calibration is to transform a pre-trained model so that the transformed model produces calibrated predictions. A calibrated model means its predictive distribution matches the empirical distribution. Thus post-hoc calibration is one way to estimate prediction uncertainties for deterministic models. Existing approaches mostly focus on constructing calibration maps with low calibration errors, however, this property is inadequate for a calibrator being useful. In this chapter, we introduce a reject option in our calibration framework. In addition, to control properties of this reject module, we consider two constraints that are practical in designing a post-hoc calibration map. Under some mild assumptions, two high-probability bounds are given with respect to these constraints. Empirical results on CIFAR-10, CIFAR-100 and ImageNet and a range of popular network architectures show our proposed method significantly outperforms the current state of the art for post-hoc multi-class classification calibration. Overall, this thesis present novel methods to estimate uncertainties for both probabilistic and deterministic models. In addition, we also present methods to incorporate these uncertainties into the considered machine learning models. This research contributes ''one small step`` in developing uncertainty aware machine learning models.
Jaar van publicatie:2023
Toegankelijkheid:Open