< Back to previous page

Publication

Biological data mining

Book - Dissertation

Subtitle:from interestingness measure to deep learning
Biological data mining has been an active research area in bioinformatics in recent years. It is expected to unlock a new stage of biomedical research by discovering knowledge from the huge amount of available biological data using computational methods. This knowledge will generate novel insights into the mechanisms of biological systems. Furthermore, it will support the design of new drugs and development of improved solutions for informed clinical decision making. In this dissertation, we present machine learning techniques for mining interesting patterns and useful knowledge from biological data for several case studies. More specifically, the dissertation elaborates on the following three problems: mining unexpected patterns from transaction data, building associative classifiers based on association rule mining and identifying compound-protein interactions using deep neural networks. The first problem focuses on finding unexpected patterns from data. These patterns identify a failing in prior knowledge or may suggest an aspect of data that deserves further investigation. We propose a novel approach based on association rule mining along with a clustering algorithm to discover the unexpected patterns. The second problem concerns mining reliable patterns, constructing an interpretable classification model which can be understood. Interpretability of machine learning models is critical in several domains with significant social or financial impact such as healthcare, disease diagnosis. The proposed classification model is a rule list, making a single prediction based on multiple rules. We built the model using association rule mining and multi-objective optimization. The last problem we investigated concerns the problem of compound-protein interaction prediction. Identifying interactions between compounds and proteins is an essential task in drug discovery and development. Such prediction tools can be used to screen compound libraries for given protein targets to achieve desired effects or in testing given compounds against possible off-target proteins to avoid undesired effects. To tackle the problem, we developed a novel approach combining a graph convolutional network and a one-dimensional convolutional neural network. These neural networks encode the data objects, i.e. the compounds or proteins, into intermediate representations which are then used to predict the interaction. We also applied an explanation technique to visualize the contributions of the protein regions on the prediction outcome. We conclude with an overview of our main contributions as well as a discussion of potential future actions that can be taken to improve our proposed methods.
Number of pages: 115
Publication year:2022
Keywords:Doctoral thesis
Accessibility:Open