< Back to previous page

Publication

Pseudo-likelihood and Estimating Equation Methodology for Incomplete Data

Book - Dissertation

In applied research such as clinical trials, two general lines of attack have been employed to address the problem of incomplete data. The first is simply to design and carry out the study in a manner that limits the amount of incomplete data. Better implementation of a more appropiate design could reduce the frequency of missing values substantially. A variety of techniques for doing this has been proposed for clinical trials (National Research Council, 2010). The second line of attack for the treatment of incomplete data, and the focus of this thesis, is to apply analysis methods that exploit partial information in the observed data about the missing data to reduce the potential bias created by the missing data. The area of missing data analysis has grown substantially over the past few decades. Concern has been raised about simple methods such as complete case (CC) analysis and last observation carried forward (LOCF) (Little and Rubin, 2002; Molenberghs and Kenward, 2007; Kenward and Molenberghs, 2009; National Research Council, 2010). Their use is decreasing and more principled, MAR-based methods increase in use; these include multiple imputation strategies Rubin (1987) and socalled direct-likelihood or direct Bayesian analysis. These are based on the property of ignorability, which ensures that such analyses are valid under MAR, supplemented with mild regularity conditions, even without explicitly modeling the missing data mechanism, provided that all incomplete sequences are subjected to analysis (Rubin, 1976; Little and Rubin, 2002; Molenberghs and Kenward, 2007; Fitzmaurice et al., 2009). While ignorability would follow under likelihood inference, this is not generally true for non-likelihood approaches such as GEE and PL. Likelihood methods enjoy many desirable properties, such as efficiency under appropriate regularity conditions and the ability to calculate functions of interest based on the proposed parametric model. However, for non-Gaussian outcomes in contrast, not only can the specification of the likelihood function be cumbersome, but also estimation of the parameters can be computationally intensive. In addition, fully specifying the joint probability model comes with the risk of possible misspecifications. Therefore, the difficulty in evaluating the likelihood for models with discrete correlated data has motivated alternative methods of estimation, the popular ones being GEE and PL. While GEE methods replace score equations with alternative functions, in pseudolikelihood, the likelihood itself is replaced by a more tractable expression. When attention is restricted to specification of the first moments (i.e., the mean structure) of the outcome sequence only, GEE leads to valid inferences by circumventing the need to address the association structure. Because of its frequentist nature, GEE in its basic form, as applied to incomplete data, is valid only under MCAR. To allow valid use of GEE under MAR, GEE has been extended to weighted generalized estimating equations (Robins, Rotnitzky and Zhao, 1995) and doubly robust GEE (Scharfstein, Rotnitzky and Robins, 1999; Bang and Robins, 2005; Tsiatis, 2006; Carpenter, Kenward and Vansteelandt, 2006; Molenberghs and Kenward, 2007; Rotnitzky, 2009; Birhanu et al., 2011). In contrast with GEE, PL methods can easily accommodate association (Yi, Zeng and Cook, 2011; He and Yi, 2011). Broadly speaking, one might consider marginal or conditional pseudo-likelihood. Pseudo-likelihood is closely related to but different from full likelihood and therefore not guaranteed to be valid under MAR, even though in some specific cases it might, because Rubin (1976) provided conditions for ignorability that are sufficient but not always necessary. A substantial part of our work (Chapters 4, 5, 6 and 7) was devoted to the aforementioned alternatives to full likelihood, PL and GEE, with incomplete data. In their basic form, both GEE and PL, are valid only under the strongest MCAR mechanism. The aim of our work was to study in more depth the extension needed to ensure the validity of these methods under the less strong missing data mechanism, MAR. MCAR is a sufficient condition to the validity of GEE and PL. A number of extensions and modifications of GEE and PL, such as WGEE, MI-GEE, DR-GEE, and the singly robust and doubly robust version of PL are studied in the thesis.
Number of pages: 155
Publication year:2012
Accessibility:Open