< Back to previous page

Project

A declarative approach to optimizing massively parallel data processing (R-8196)

Database research has witnessed a renewed interest for parallel data processing. While distributed and parallel data management systems have been around for quite some time, it is the rise of cloud computing and the advent of big data that present new challenges. Nowadays, parallelism is not restricted to a handful of servers, but is massive ranging from hundreds to tens of thousands of computing nodes. Queries are not limited to simple keyword search but involve complex join queries over multiple database tables in support of large-scale data analytics. Furthermore, performance is no longer dominated by the number of I/O requests to external memory as in traditional systems but by the communication cost for reshuffling data over the network during query execution. The latter calls for novel techniques for analyzing and optimizing complex queries in the massively parallel setting. Unfortunately, the rise of many different systems each with their own characteristics has led to a divergence of ad-hoc specialized techniques that are difficult to transfer between different systems. In this work, I want to develop a uniform approach towards optimization of queries in massively parallel systems. In particular, my research proposal has the following objectives: (1) develop a declarative framework for massively parallel data processing; (2) study decision problems in support of static analysis of queries in this framework; (3) develop general techniques for multi query optimization.
Date:1 Oct 2017 →  30 Sep 2019
Keywords:Big data, databases, Parallel data processing,-, Query optimiza
Disciplines:Applied mathematics in specific fields