Het stimuleren van biomedische beeldgegevensanalyse door middel van geavanceerd machinaal leren
Mass spectrometry imaging (MSI) is a promising, label-free molecular imaging technology, which enables mapping the spatial distribution of hundreds to thousands of biomolecules in a tissue within a single experiment. Additionally, as it is a label-free and untargeted molecular imaging technology, it does not require prior knowledge about the tissue or target-specific reagents. Due to those outstanding characteristics, MSI is becoming increasingly popular and rapidly adopted for various applications such as biomarker discovery, clinical diagnostics, and drug delivery studies.
The generated data from a single experiment is complex, large and high-dimensional, posing great challenges in computational analysis in MSI. Therefore, this thesis focuses on the development and application of computational methods to MSI data. More specifically, various advanced machine learning algorithms and multimodal strategies were developed and applied, revealing significant improvement to current computational analysis in MSI. The advent of machine learning and deep learning has greatly facilitated complex spatial, high-dimensional data analysis. Thus, those deep learning and machine learning based models were applied in this thesis to improve the performance of unsupervised learning (clustering) and supervised learning (classification) tasks in MSI. Additionally, an advanced multimodal integration pipeline was proposed, enabling the direct correlation and comparison of spatial multi-omics datasets.
Firstly, we focused on the unsupervised learning methods, which supports exploring the underlying patterns within the complex MSI data. We applied a pre-trained neural network to extract high-level features from ion images in MSI data, which then were used to cluster ion images in MSI data based on the spatial expressions. We compared the proposed strategy with the standard clustering pipeline, which uses regular ion images as input of the clustering tasks. The results show the improvement of the proposed deep learning based clustering strategy in more fine-grained clusters, and greater consistency in cluster assignment. Additionally, we introduced the relative isotope ratio metric to quantitatively evaluate clustering quality and benchmarked our results on public data. The results show the potential of resulting neural network interpretation of the ion images, which can be extended into any MSI-focused unsupervised or supervised machine learning pipeline.
One of the promising applications in MSI is clinical decision support and, ultimately, diagnostics, in which MSI data is used to build a classifier to improve the pathologists’ diagnosis and tumor typing. Most MSI-focused studies collect microscopy data in tandem with MSI data, meaning that this imaging modality (microscopy) is generally omitted in the downstream analysis. However, there is very little study about using microscopy data together with MSI for clinical diagnosis, namely, the classification task. Therefore, we proposed a multimodal MSI classification pipeline in the melanoma study that uses a pre-trained neural network to extract the morphological features from microscopy data. The resulting morphological features were integrated with MSI data for improving downstream classification tasks, therefore, improving melanoma diagnosis. The multimodal pipeline was compared with other unimodal strategies, resulting in significant improvement based on nested-cross-validated ROC-AUCs. More importantly, as a pre-trained neural network requires no training nor labels, this multimodal pipeline is extremely efficient and flexible. Thus, it can be readily applied in other experimental settings where microscopy is acquired in tandem with MSI.
Finally, we extended the multimodal MSI pipeline to spatial multi-omics applications. We obtained valuable spatial multi-omics datasets from high-risk prostate cancer patients, including in situ lipidomics via mass spectrometry imaging (MSI) and spatial transcriptomics (ST) data. The proposed multimodal pipeline successfully registered and aligned these two complex and high-dimensional datasets from the serial sections, enabling direct comparison and correlation of lipidomics and transcriptomics. Thus, the proposed spatial multi-omics integration pipeline is a key stepping stone for any other downstream analysis of the integrated datasets, empowering the mapping of the molecular heterogeneity and the understanding of their intricate connections for gaining better understanding of cancer biology and a better patient stratification.