New statistical tools for analysing complex survey data in the social sciences
Social research increasingly requires investigators to gather large and complex data that are frequently cross-national in nature. Using large questionnaires, skills, attitudes, and traits are measured and saved in huge datasets. A typical example of a complex dataset is the PISA study (Programme for International Student Assessment), a triennial international survey that aims to evaluate education systems worldwide by testing 15-year-old studentsU+2019 skills and knowledge. Statistical techniques to analyse these complex data have to adequately deal with the combination of 1) the clustering of students in various countries, 2) the categorical response options of the questionnaires, such as U+201CcorrectU+201D/ U+201CincorrectU+201D, and 3) missing values in the data when respondents fail to fill in all questions. Unfortunately, the current available techniques fail to adequately deal with complex data and as a consequence, researchers often adopt suboptimal analysis techniques. In this project, I aim to develop new statistical techniques to analyse large and complex data in a correct and practical manner. To this end, I will 1) develop a general statistical method for analysing complex data, 2) provide solutions for clustered data with missing values and develop fit measures to indicate whether a model fits the data, and 3) develop freely available software for researchers. Moreover, this research will lead to clear guidelines for researchers in social sciences dealing with large and complex data.