< Terug naar vorige pagina

Publicatie

Rank Matrix Factorisation and its Applications

Boek - Dissertatie

Rank data, in which each row is a complete or partial ranking of available items (columns), is ubiquitous. It can be used to represent, for instance, preferences of users, the levels of gene expression, and the outcomes of sports events. While rank data has been analysed in the data mining literature, mining patterns in such data has so far not received much attention. To alleviate this state of affairs, in this research we study pattern set mining in rank data, i.e., the discovery of a small set of patterns that can describe the structure of the data, and its applications in data mining and bioinformatics. First, we propose a general framework based on matrix factorisation for mining different types of patterns in rank data. Rather than relying on the traditional linear algebra for matrix factorisation, we employ semiring theory, which results in a more elegant way of aggregating rankings. Subsequently, we introduce two instantiations of the framework: Sparse RMF and ranked tiling. We introduce Sparse RMF to mine a set of sparse rank vectors that can be used to summarise given rank matrices succinctly and show the main categories of rankings. We introduce ranked tiling to discover a set of data regions in a rank matrix which have high ranks. Such data regions are interesting as they can show local associations between subsets of the rows and subsets of the columns of the given matrices. Finally, we propose to use ranked tiling to formally define the concept of driver pathways, which is the molecular mechanisms driving tumorigenesis. Given the discovered driver pathways, we can find cancer subtypes, which are groups of tumour samples having a unique combination of driver pathways.
Jaar van publicatie:2017
Toegankelijkheid:Open