< Back to previous page

Project

Estimation in large-scale heterogeneous and heteroscedastic quantile regression (R-7534)

Statisticians are frequently confronted with massive data sets from various scientific research domains. We look in particular at large-scale data with an extremely large number of covariates and extremely high sample size, where the existence of sub-populations complicates the model and where moreover, the number of sub-populations grows with the sample size. Additionally, such data typically exhibit large deviations from the classical homoscedastic assumptions, where the error of the model is independent of the observed covariates. The computational difficulty comes from the fact that large-scale data are huge in the sense that simple matrix multiplication is not computable on a single computer. We propose a semi-parametric, time-varying quantile regression framework for modeling massive heterogeneous data. We model a quantile of the response of interest (instead of the mean in classical linear regression) based on covariates as a common non-parametric effect for all subpopulations as well as a varying coefficient effect for each sub-population. The variance in this heteroscedastic model is modeled in a non-parametric way. Estimation of the non-parametric components (for commonality, heterogeneity and heteroscedasticity) is done with P-splines. In addition we test significance of both the heterogeneity and heteroscedasticity effects among such a large number of sub-populations.
Date:1 Jan 2017 →  31 Dec 2019
Keywords:heteroscedastic, massive data, Quantile regression
Disciplines:Applied mathematics in specific fields, Statistics and numerical methods