Project
Methods for developing and validating clinical risk prediction models with applications
Risk prediction models gain importance as clinical decision support tools. Nevertheless, prediction models do not find their way to everyday practice easily enough. Important factors are
simplicity, validity and robustness. This can be achieved by sensible model development leading to a limited set of predictors, a sufficiently large sample size and/or the use of penalization methods to prevent overfitting. However, variable selection typically relies on statistical considerations and ignores potential utility for clinical decision making. This PhD project will propose utility-based variable selection methods. In addition, it is often advised to have sufficient ‘events per variable’ (EPV) when developing models in order to avoid overfit models that do not validate on new data. However, it is likely that different types of predictors (categorical, ordinal, continuous, nonlinear terms, interactions) require a different number of events, such that the general recommendation to have at least 10 EPV may often be too rough. This PhD project will look into EPV requirements for different variables, and will investigate whether learning curves of performance during variable selection is an efficient alternative to simple EPV recommendations. Finally, EPV recommendations and learning curves will be investigated for penalized regression methods. These methods will be tested on several topics within obstetrics and gynecology, such as ovarian cancer diagnosis using data from the International Ovarian Tumor Analysis (IOTA) group, endometrial cancer diagnosis, and prediction of pregnancy outcomes.