< Terug naar vorige pagina

Publicatie

Unravelling the determinants of molecular host-pathogen interactions with machine learning

Boek - Dissertatie

In this thesis we study the application of data mining and machine learning techniques in the broad context of biomolecular networks. We cover three main topics, each related to a different type of network data, analysis method and underlying research question. In the first section, we explore a new framework, based on frequent itemset mining and association rule mining, to distil biologically relevant information from host-pathogen protein-protein interaction networks and their annotation data, into a more interpretable format. The technique offers a translation of expert knowledge into a rule-based summary, which also lends itself nicely to visualizations, although it remains challenging to find the appropriate level of granularity for the specific taxonomic subject of interest. The second module focuses on the well-studied problem of finding subgraph patterns in a bigger graph. Here we build upon earlier work that aims to uncover those subgraphs that are associated with a specific set of nodes of interest. It provides a unique extension to the widely used enrichment analysis methodologies by integrating network structure and functional annotations in order to discern novel biological subgraphs which are enriched in the targets of interest. We present a software package, termed MILES, which adds additional functionality and visualization capabilities to the original work. The final part of this work is situated in the field of immunoinformatics. The molecular interactions between epitopes and T-cell receptors (TCRs) play an important role in the adaptive immune system. These interactions can also be represented as a network, and we showcase a novel technique for predicting new edges within it. Namely, we employ convolutional neural networks and a feature representation method inspired by image classification to create a generic classification model that can operate directly on the amino acid sequence of the two molecular partners. In addition, we compare validation strategies to assess the generalization performance for both seen and unseen epitopes, as well as discuss various challenges that are inherent to TCR-epitope data. While our method shows promise, it is clear that the open problem of predicting TCR-epitope interaction for unseen epitopes still requires further improvements, especially in terms of the diversity of the available data.
Aantal pagina's: 107
Jaar van publicatie:2021
Trefwoorden:Doctoral thesis
Toegankelijkheid:Open