Data compression for probabilistic logics
Learning propositional logical theories from examples is one of the quintessential problems in machine learning, as well as the most basic setting for inductive logic programming and logical and relational learning. One of the most fascinating aspects of logic learning is that of predicate invention, that is, the ability to automatically introduce new predicates in the language that not only compress the data, but also provide insights into the underlying regularities. In this thesis, we will investigate data compression and predicate invention techniques in the context of probabilistic logics, that is, in a setting that explicitly supports uncertainty on the level of both the data and the regularities described. This is an interesting setting both on its own, as it allows for new ways to obtain insights into uncertain, structured domains, and in the broader context of inference approaches for probabilistic logics. Indeed, state of the art inference techniques for probabilistic logic languages, including DTAI's probabilistic logic programming language ProbLog, are based on a reduction to weighted model counting, where weighted propositional CNF formulas are the central data structure. The size of these formulas is a key limiting factor, and being able to compress them is therefore a promising direction to further advance the state of the art. This thesis will develop data compression techniques for probabilistic logics in two phases. In the first phase, we will focus on the fundamental case of compressing weighted propositional CNF formulas, including the ones generated during ProbLog inference. In the second phase, we will build upon the insights obtained from the first phase to develop techniques that avoid the potential bottleneck of constructing the full uncompressed CNF first, and operate directly on the level of relational representations.