Project
Scalable, interpretable and versatile models of relational data: design, induction and inference.
This PhD aims to discover methods to scale up relational machine learning algorithms. Relational learners can, as opposed to propositional learners, exploit the complex structure consisting of links between entities in datasets to learn a concept.
There are two important developments that reinforce the need for scaling up relational learning: (1) relational data is being produced at a staggering rate and current relational learners cannot keep up, and (2) most of the current machine learning research has focused on propositional data despite the prevalence of relational data.
A number of powerful relational learning algorithms exist, but all of them suffer from poor scalability. In practice, this means that they can only handle small amounts of data and are expensive to deploy. Existing attempts to resolve these scalability issues are still unsatisfactory.
This research proposes a number of novel approaches inspired by innovations from the database community. Specifically, we look at how techniques used by query optimizers in databases can be exploited in a non-standard way to speed up relational learners. We will investigate the use of (1) bit-level data representations, (2) query size estimation techniques, and (3) (random) rule generation that facilitates finding relational patterns faster.
If successful, this research will produce scalable relational machine learning systems that can be applied in an industrial context immediately.