< Back to previous page

Project

Rank matrix factorisation and its applications

Rank data, in which each row is a complete or partial ranking of available items (columns), is ubiquitous. It can be used to represent, for instance, preferences of users, the levels of gene expression, and the outcomes of sports events. While rank data has been analysed in the data mining literature, mining patterns in such data has so far not received much attention. To alleviate this state of affairs, in this research we study pattern set mining in rank data, i.e., the discovery of a small set of patterns that can describe the structure of the data, and its applications in data mining and bioinformatics.

First, we propose a general framework based on matrix factorisation for mining different types of patterns in rank data. Rather than relying on the traditional linear algebra for matrix factorisation, we employ semiring theory, which results in a more elegant way of aggregating rankings. Subsequently, we introduce two instantiations of the framework: Sparse RMF and ranked tiling. We introduce Sparse RMF to mine a set of sparse rank vectors that can be used to summarise given rank matrices succinctly and show the main categories of rankings. We
introduce ranked tiling to discover a set of data regions in a rank matrix which have high ranks. Such data regions are interesting as they can show local associations between subsets of the rows and subsets of the columns of the given matrices. Finally, we propose to use ranked tiling to formally define the concept of driver pathways, which is the molecular mechanisms driving tumorigenesis. Given the discovered driver pathways, we can find cancer subtypes, which are groups of tumour samples having a unique combination of driver pathways.

Date:1 Oct 2011 →  11 Jan 2017
Keywords:rank matrix factorisation, Contraint Programming, pattern set mining, ranked tiling, Sparse RMF, cancer subtype, data mining
Disciplines:Applied mathematics in specific fields, Computer architecture and networks, Distributed computing, Information sciences, Information systems, Programming languages, Scientific computing, Theoretical computer science, Visual computing, Other information and computing sciences
Project type:PhD project