< Back to previous page

Project

Cure models in survival analysis: from modelling to prediction assessment of the cure fraction.

Survival analysis examines and models the time it takes for events to occur. The typical event is death, from which the name ‘survival analysis’ and much of its terminology derives. Since the data can only be collected over a finite period of time, the ‘time to event’ may not be observed for all the individuals. This is the case for example when a patient leaves a clinical study before it ends or she/he is still alive by the end of the study. In such a case, the death time (time to event) for this individual is unknown. Such a phenomenon, named censoring, creates some unusual difficulties in the analysis of survival data that cannot be handled properly by standard statistical methods.

In traditional survival analysis, all subjects in the population are assumed to be susceptible to the event of interest, that is, every subject has either already experienced the event or will experience it in the future. However, in many situations it may happen that a fraction of individuals (long-term survivors) will never experience the event, that is, they are considered to be event free. For example, a treatment is assigned to patients in order to evaluate the effect on the recurrence of a disease.

In the literature on cure models there are basically two types of models: the mixture cure model and the so-called promotion time cure model. In the former model one models the survival function by assuming that the underlying population is a mixture of two sub-populations: the sub-population of ‘susceptibles’ (i.e. those who will experience the event and have finite survival time) and the sub-population of ‘non-susceptibles’ (i.e. those who are event free and have an infinite survival time). On the other hand, the promotion time cure model is motivated by an underlying biological interpretation in terms of time to onset of cancer, and uses a direct modelling approach without separating susceptibles and non-susceptibles as is the case in the mixture cure model. In that sense, the two modelling approaches are quite different. Both models have been extensively studied in the literature, conditions under which the models are identifiable have been obtained and different parametric, semiparametric and nonparametric estimation procedures have been proposed and studied both asymptotically and for finite samples. 

In this thesis we are interested in investigating three directions related to these models. 

The first contribution consists in providing a state of the art on cure models reviewing all the points mentioned above and providing a formal and a numerical comparison of the two models through an application on real data. 

The second contribution of this thesis focuses on the mixture cure model and more precisely on the probability of being susceptible. Often, this quantity is modelled parametrically, assuming a logistic regression model. However, there is no reason to strictly constrain and limit the probability of being susceptible to a logistic form. Our aim is then to propose a more flexible modelling approach for this quantity, by assuming a single-index structure, that is, a generalised linear model in which the link function is left unspecified, and by considering a Cox proportional hazards model for the conditional survival function of uncured subjects. 

Finally, beside modelling and model selection, an important topic of statistical analysis is the question of model assessment, by means of the evaluation of the predictions that can be made from a given model. For cure models, predictions can be performed for two outcomes, the survival at a given time and the cure status, both of them being binary. Often, when one wants to assess the binary classification performance, the Receiver Operating Characteristic (ROC) curve is considered. However, while standard ROC curves suppose that the classes of the outcome are fully observed, building a ROC curve from cure survival data is a non trivial problem since survival data are subject to censoring and hence, the cure status is unobserved. This last contribution concerns therefore the development of ROC curves to evaluate the performance of cure status prediction that can be made from survival data in the presence of a cure fraction. 

Date:23 Nov 2016 →  9 Nov 2018
Keywords:Statistics, Survival analysis, Cure models
Disciplines:Applied economics, Economic history, Macroeconomics and monetary economics, Microeconomics, Tourism
Project type:PhD project