< Terug naar vorige pagina


Data partitioning for single-round multi-join evaluation in massively parallel systems

Tijdschriftbijdrage - Tijdschriftartikel

A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication-free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We provide a semantical characterization for when conjunctive queries (and extensions thereof) are parallel-correct and give matching complexity bounds for the associated decision problem. Motivated by scenarios for workload optimization, we further consider the problem of parallel-correctness transfer from a query Q to a query Q', that is, whether Q' is parallel-correct for all distribution policies for which Q is parallel-correct. In this case, Q' can always be evaluated after Q without repartitioning the data. We provide a semantical characterization for parallel-correctness transfer and provide matching complexity bounds for the associated decision problem for conjunctive queries (and extensions). Finally, we investigate restrictions of queries and families of distribution policies with better complexities, including, for instance, the Hypercube distributions.
Tijdschrift: SIGMOD RECORD
ISSN: 0163-5808
Issue: 1
Volume: 45
Pagina's: 33 - 40
Aantal pagina's: 8
Jaar van publicatie:2016