Iterative and multi-level methods for Bayesian multirelational factorization with features.
Many machine learning problems (classification, clustering, etc.) can be formulated as a factorization of an incompletely filled matrix where the goal is to predict the unknown values. These methods have been successful in large-scale recommender systems, like the Netflix challenge aimed at predicting movie preferences of users from 200 million movie ratings from half a million users. In particular, we are now generalizing the Bayesian Probabilistic Matrix Factorization by developing a method - Bayesian multirelational factorization with features that can simultaneously factorize multiple relations (i.e., multiple matrices) and also incorporate additional entity-level features. While basic approaches to matrix factorization rely on classical optimization, our approach relies on Bayesian Markov Chain Monte Carlo, specifically Gibbs sampling. In fact, our approach can be reformulated as a sequence of linear system solves with many right-hand sides. If the dimensionality of the features is high, then solving the corresponding linear systems becomes the main computational bottleneck. This project aims at scaling up the method to very large sparse problems by reducing of the costs associated with the linear system solver. The bottlenecks are (1) the number of Gibbs iterations, (2) the number right-hand sides to which the linear solver is applied in each Gibbs iteration, and (3) the number of iterations of the linear solver (which increases further with the size of the data set).