< Terug naar vorige pagina

Publicatie

Data Science Approaches for the Analysis and Interpretation of Training Load Data of Athletes

Boek - Dissertatie

Research on the analysis of real-world sports data dates back at least to 1958 (Lindsey 1959; Rubin 1958). Advances in technology have caused an explosion of the amount of sports-related data about sports. The abundance of data has attracted the interest of both the academic community and the industry. The aim of this sports analytics community is to leverage the available data to help decision makers to gain a competitive advantage (Alamar and Mehrotra 2011). The advent of wearable technology has yielded a new data source that still has a lot of unexplored potential. These data can assist practitioners to monitor athletes during daily life activities (Kwapisz et al. 2011) and rehabilitation (Um et al. 2017; Whelan et al. 2016), to quantify their training loads (Bourdon et al. 2017; Halson 2014; Jaspers, Brink, et al. 2017), and to analyze their risk of injury (Gabbett and Ullah 2012). From a data science perspective, these continuous monitoring data pose several interesting data challenges. First, combining the data of different athletes is non-trivial due to inter-individual differences. Second, because the behavior of athletes can change and because often only limited individual data are available, it is also non-trivial to model the data on an individual level. Third, the use of subjective measures to quantify certain aspects of the athlete (e.g., perceived wellness), confounding factors (e.g., running speed), and missing values further complicate the analysis of these data. In this thesis we evaluated how data science techniques can provide value to the analysis and interpretation of athletes' training load data. Our main focus is on the analysis of training load data from soccer players and outdoor runners. Specifically, we examined three relevant relationships. First, we studied how soccer players perceive external loads. Second, we modeled the relationship between external and internal load, and perceived wellness of soccer players. Third, we analyzed the relationship between biomechanical movement data of outdoor runners and their perceived fatigue status. We presented three types of evidence to support the dissertation statement. First, we found that both data-driven feature selection methods and simple statistical features can complement expert knowledge. Second, we illustrated that group models can be used to individually monitor an athlete when limited-to-no prior data are available for that athlete. Third, we showed that machine learning techniques are well suited to model the complex relationships that are relevant for the analysis of athletes' training load data: non-linear relationships, relationships between objective and subjective variables, and relationships where multicollinearity exists among the input variables. Additionally, we formulated some lessons learned for data scientists. We argued that modeling the context of and athlete's data, either explicitly or implicitly, can improve the performance of predictive models by adjusting for inter- and intra- subject differences and external factors. We presented several such strategies: standardizing features relative to an individual baseline, predicting a normalized target variable instead of the originally reported target variable, and adding the previous state as a feature. Moreover, we identified subtle data dependencies, that hinder obtaining an unbiased estimation of a model's ability to generalize to unseen data. We identified three limitations of the current thesis. First, we evaluated the methodologies to monitor soccer players on the data of only one club. Second, the data collection protocol to collect outdoor data from runners experimentally controlled for total distance, intensity, and running surface and might have introduced a bias towards reporting higher fatigue scores near the end of the protocol. Third, RPE, a subjective measure used in every relationship of this thesis, quantify muscular fatigue, as well as cardiovascular and psychological fatigue. Future research in this area can benefit from an interdisciplinary collaboration between data scientists, sports scientists and other domain experts. A close collaboration throughout all phases of the data science process can further advance the state of the art. First, it will improve the quality of the data that is being collected. Second, it can help to properly contextualize the data when modeling relevant relationships. Third, it will allow obtaining an unbiased estimation of these predictive models.
Jaar van publicatie:2019
Toegankelijkheid:Open