Title Abstract "Selecting for Innovation: development and test of idea selection strategies in organizations, based on the U+201CMotivated Information Processing in Groups U+201C model" "Organizations use brainstormtechniques in order to innovate. With this technique ideas are first generated and then ideas are selected. However, recent studies show that people are not good at selecting ideas. This research aims to optimize idea selection, using the Motivated Information Processing in Groups model, a validated model derived from decision making literature." "Model estimation and selection for multiresolution graphical models." "Gerda Claeskens" "Operations Research and Statistics Research Group (ORSTAT) (main work address Leuven)" "The main goal of the project is to develop and validate methods to estimate networks at different levels or resolutions. One of the aims is to determine which level is most appropriate to estimate and interpret such network models and this will help researchers in the field to better understand the properties and the characteristics of the appropriate models that should be used to analyze the available data. The techniques I propose are oriented towards the estimation aspect (selecting a graphical object is inherently connected to selecting the nodes between which edges are placed and to the type of edges that one should place between nodes) as well as to a thorough and rigorous study of theoretical properties in a multiresolution framework. The multiresolution aspect relates to having collected data at different levels of coarseness and as such, one is interested in selecting an appropriate level of coarseness. Natural contexts where such situations can occur are, for example, financial applications, image denoising, gene expression data and functional magnetic resonance imaging (fMRI). In the analysis of brain connectivity from fMRI data, a scientist takes a series of measurements on brain regions which range from being very coarse (relatively large in size) to being very fine (relatively small in size). Once new sound methodologies are created, I proceed with proposing extensions with the aim to relax constraining assumptions or towards other classes of models." "Inference after model selection and averaging via confidence distributions and curves" "Gerda Claeskens" "Operations Research and Statistics Research Group (ORSTAT) (main work address Leuven), Statistics and Risk" "Model selection and model averaging are common practices to find the best model that explains the observed data. When the working model is selected using data-driven methods and the same data are used for inference about population parameters, guarantees of classical inference techniques might not hold anymore. This dissertation discusses ways of producing valid inference for post-selection and for model averaged estimators via confidence distributions and confidence curves. While classical inference concepts such as p-values, confidence intervals and point estimators can easily be read from a confidence distribution, it gives more information about the value of a parameter of interest than a single confidence interval or a ""single hypothesis"" test. The first three chapters focus on how to obtain optimal post-selection conditional confidence distributions for possibly misspecified selected linear, generalized linear and linear mixed regression models. The fourth chapter provides a bootstrap approach to estimate the distribution of the more general model-averaged estimators in likelihood-based models." "A study of correct statistical inference after model selection took place" "Gerda Claeskens" "Operations Research and Statistics Research Group (ORSTAT) (main work address Leuven)" "Classical methods for inference in statistics assume that a model is given before looking at the data and that this model perfectly describes how the data were generated. Statistical practice proceeds in another fashion: the data are used, either visually via plots, and/or by fitting several models, performing variables selection, model selection or regularization to arrive at one or more plausible models. Those selected models are then used for statistical inference. In this thesis, a study will be made of how to obtain valid inference with honest p-values for hypothesis testing and with confidence intervals that have a correct coverage when models are used that have been selected in some form." "Visual scene understanding for model selection in autonomous driving" "Luc Van Gool, Tinne Tuytelaars" "Processing Speech and Images (PSI)" "It is easier to develop models for situations limited in their scope, rather than serving a wide variety of situations. If multiple such models are used, it is also important to select the correct one each time to interpret the incoming data. An autonomous system must, therefore, be able to identify in which situation or regime of conditions it is operating, to then select the appropriate model. This work is interested in exactly this regime identification and model selection. The thesis will also investigate the possibility of using intermediate steps between two models for different operating regimes. With this last objective it could help to provide reliable control models even with scarce data availability to train these models." "The effects of selection by regularization in high-dimensions - composite estimation, model averaging and dimension reduction" "Gerda Claeskens" "Operations Research and Statistics Research Group (ORSTAT) (main work address Leuven)" "There is a recent awareness of requiring additional efforts for inference when a selection of variables or models has taken place. This can be placed into the larger framework of correct inference and research integrity.We focus mainly on the high-dimensional setting and consider these valid post-selection issues for composite estimation, model averaging and dimension reduction. Composite estimation involves using a linear combination of loss functions where the coefficients, i.e. the weights, might be datadriven. In high dimensions, variable selection is achieved by regularization. We investigate its influence on the estimator's asymptotic distribution and use this to obtain a proper choice of the weights.In model averaging a linear combination of different estimators (all based on the same dataset) is constructed. We study how to do this in high dimensions and relate model averaging estimators to composite estimators.Sparse sufficient dimension reduction also uses regularization. Its effect will be studied theoretically to arrive at correct inference.A correct way of including the selection uncertainty will have high impact on the many domains where such methods are used. Freely available software will be developed." "Model selection theory for tree-structured estimation schemes." "Gerda Claeskens" "Operations Research and Statistics Research Group (ORSTAT) (main work address Leuven)" "The main object of this research are signals, images,. . . that are observed with noise. Once the object of interest can be described in a parsimonious way by a projection onto a family of mathematically well defined atoms (waveforms, bases,. . . ), the denoising or smoothing problem can be interpreted as a model selection problem in the domain of the projection coefficients. Of the currently existing selection rules, a vast majority solely exploits the information given by an individual coefficient. None of the existing approaches addresses the challenge of linking the geometrical structure in the coefficient domain to the development of a functional model on the decay properties of the coefficients over the different scales. We propose to step away from the traditional Besov constraint in order to refine the minimax approach. Also maxiset properties will be studied. Extensions include structures that are no longer trees, such as graphical models. We will extend model selection methods by pooling information and will study the statistical properties of model averaging estimators in this framework." "AdaPore: rigorous and fully adaptive model selection for multiphase flow through porous media" "Sorin POP" "Computational mathematics" "Multiphase flow through heterogeneous porous domains describes processes such as oil and gas extraction, hydrogeological flow, and CO2 sequestration in the subsurface. The porous flow models depend on the number of phases involved and range from linear elliptic to nonlinear degenerate parabolic, hyperbolic, and even higher-order equations if hysteresis and dynamic effects are considered. A choice of model is often made a priori based on experience and computational budget. This project aims to investigate the adaptive selection of accurate and computationally inexpensive mathematical and numerical models locally in space-time subdomains. This will be achieved by deriving locally space-time efficient a-posteriori estimators and other mathematically rigorous error indicators. The sub-problems defined over the subdomains will be solved in parallel and will be combined through a heterogeneous domain decomposition scheme. The convergence of the scheme will be proved and codes will be developed for industrial-scale problems. For the hyperbolic case, non-classical Riemann solutions, resulting from the incorporation of hysteresis and dynamic effects, will be derived and implemented in a Godunov type solver. Machine learning will be used to expedite the selection of sub-domains and non-classical Riemann solutions at mesh interfaces." "Analysis of high-throughput data by means of support vector machines and kernel-based techniques: feature selection and adaptive model building." "Bart Goethals" "ADReM Data Lab (ADReM)" "In many real-life applications, information gathered from measurements is essential to ensure the quality of products and to enable control of a production process. These measurements are typically obtained from online hardware analysers (e.g. thermometers, flow meters, etc). However, there are many characteristics that cannot be obtained through online equipment and for which time-consuming and computationally expensive analysis is required.For this reason models are typically used to predict the results of such an analysis from the process variables. The analysis is then used as a confirmation of the model. Models are sometimes also used to predict online hardware analysers. Online analysers may fail due to corrosion or drift from their calibration point.In this project we address a number of issues related to the construction of models using Support Vector Machines. Our interest in building models using SVMs has several reasons.- It is well-known that SVMs can handle high-dimensional data without suffering from the curse of dimensionality. - The use of kernels enables nonlinear modelling.- SVMs can be made insensitive to noise and outliers.- Finally, the ability of SVMs to identify ""unusual"" data points makes it useful in detecting outliers and anomalies.The issues we aim to address in this project are the following.I. Feature selection and incorporation of prior knowledgeIt is the aim to investigate whether similar results can be obtained for Support Vector Regression and how well the technique applies to single-class problems.II. Adaptive model buildingTechniques that can handle the adaptivity of the inferential sensor at all levels, and especially when the mathematicalmodel needs to be partially rebuilt, are still in their infancy and are the second topic of this research project." "Robust inference and model selection techniques" "Stefan Van Aelst" "Department of Applied Mathematics, Computer Science and Statistics" "The development and study of robust inference techniques and robust methods for the construction and selection of statistical models. Robust methods and techniques for high-dimensional data will be developed. The level of robustness as well as efficient computation time are important factors in this process."