< Back to previous page

Project

Clinical risk prediction models based on multicenter data: methods for model development and validation

Risk prediction models are developed to assist doctors in diagnosing patients, decision-making, counseling patients or providing a prognosis. To enhance the generalizability of risk models, researchers increasingly collect patient data in different settings and join forces in multicenter collaborations. The resulting datasets are clustered: patients from one center may have more similarities than patients from different centers, for example, due to regional population differences or local referral patterns. Consequently, the assumption of independence of observations, underlying the most often used statistical techniques to analyze the data (e.g., logistic regression), does not hold. This is mostly ignored in much of the current clinical prediction research. Research that relies on faulty assumptions may yield misleading results and lead to suboptimal improvements in patient care.

To address this issue, I investigated the consequences of ignoring the assumption of independence and studied alternative techniques that acknowledge clustering throughout the process of planning a study, building a model and validating models in new data. I used mixed and random effects methods throughout the research as they allow to explicitly model differences between centers, and evaluated the proposed solutions with simulations and real clinical data. This dissertation covers sample size requirements, data collection and predictor selection, model fitting, and the validation of risk models in new data, focusing mainly on diagnostic models. The main case study is the development and validation of models for the pre-operative diagnosis of ovarian cancer, for which the multicenter dataset collected by the International Ovarian Tumor Analysis (IOTA) consortium is used.

The results suggested that mixed effects logistic regression models offer center-specific predictions that have a better predictive performance in new patients than the predictions from standard logistic regression models. Although simulations showed that models were severely overfitted with only five events per variable, mixed effects models did not require more demanding sample size guidelines than standard logistic regression models. A case study on predictors of ovarian malignancy demonstrated that in multicenter data, measurements may vary systematically from one center to another, indicating potential threats to generalizability. These predictors could be detected using the residual intraclass correlation coefficient and may be excluded from risk models. In addition, a case study showed that, if statistical variable selection is used, mixed effects models are required in every step of the selection procedure to prevent incorrect inferences. Finally, case studies on risk models for ovarian cancer demonstrated that the predictive performance of risk models varied considerably between centers. This could be detected using meta-analytic models to analyze discrimination, calibration and clinical utility.

In conclusion, taking into account differences between centers during the planning of prediction research, the development of a model and the validation of risk predictions in new patients offers insight in the heterogeneity and better predictions in local settings. Many methodological challenges remain, among which the inclusion of predictor-by-center interactions, the optimal application of mixed effects models in new centers, and the refinement of techniques to summarize clinical utility in multicenter data. Nonetheless, the findings in this dissertation imply that current clinical prediction research would benefit from adopting mixed and random effects techniques to fully employ the information that is available in multicenter data.

Date:3 Oct 2011 →  31 Dec 2016
Keywords:risk prediction, multicenter data, Ovarian tumours, Pre-operative diagnosis
Disciplines:Control systems, robotics and automation, Design theories and methods, Mechatronics and robotics, Computer theory, Modelling, Biological system engineering, Signal processing, Applied mathematics in specific fields, Computer architecture and networks, Distributed computing, Information sciences, Information systems, Programming languages, Scientific computing, Theoretical computer science, Visual computing, Other information and computing sciences
Project type:PhD project