Project

A p-value pooling approach to significance testing with multiple imputation for missing data.

To some or the other extent, almost all data are incomplete. Incomplete data are generally considered to be problematic as they can reduce power and significantly bias the ensuing inferences drawn. Multiple imputation has emerged as one of the state-of-the-art methods for handling missing data. The multiple imputation approach involves creating m datasets with all missing values imputed. Each dataset is then analyzed separately and in the final step, estimates from the m datasets are pooled together using existing pooling rules. Whereas multiple imputation generally produces unbiased estimates and standard errors, existing evidence suggests that current approaches to testing significance of multiply imputed estimates have many issues (e.g. lack of power, large sample size requirements, dependency on percent of missingness). Since most researchers using multiple imputation use this technique in conjunction to frequentist hypothesis testing, lack of reliable techniques to test statistical significance of multiply imputed estimates is highly problematic. Consequently, research exploring the statistical properties of existing significance testing techniques is urgently needed. Additionally, the proposed doctoral project will also focus on developing a robust technique for significance testing in pooled estimates that are obtained using multiply imputed datasets, even when the imputations do not provide estimates but rather different quantities (e.g., p-values only). We propose pooling p-values (or test statistics) as a means of improving on the existing methods of significance testing in the multiply imputed datasets, in particular when no estimates are available. Finally, we will also implement the new technique of significance testing in form of an R package that seamlessly works with existing multiple imputation packages like mice and amelia. In summary, the proposed doctoral project has following four objectives: 1. Evaluating the appropriateness of the existing methods of generating significance levels from multiply-imputed datasets. 2. Developing a new approach for significance testing in multiply imputed data sets which is based on properly pooling of p-values. 3. Comparing the existing methods with the newly developed p-value pooling method. 4. Implementing the p-value pooling method in an R-package

Date:1 Oct 2019 → 1 Oct 2023

Keywords:Multiple imputation, pooling p-values, missing data

Disciplines:Biostatistics

Project type:PhD project

Project

A p-value pooling approach to significance testing with multiple imputation for missing data.

Researchers

Project partners

Funding