Publicatie

Distribution Constraints: The Chase for Distributed Data

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

Distributed storage and processing of data has been used and studied since the 1970s and became more and more important in the recent past. One of the most fundamental questions in distributed data management is the following: how should data be replicated and partitioned over the set of computing nodes? It is paramount to answer this question well as the placement of data determines the reliability of the system and is furthermore critical for its scalability including the performance of query processing. On the one hand, despite the importance of this question and decades of research, the placement strategies remained rather simple for a long time: horizontal or vertical fragmentation of relations – or hybrid variants thereof [37]. These placement strategies often require a reshuffling of the data for each binary join in the processed query which are commonly based on a range or hash partitioning of the relevant attributes. Recently, however, more elaborated schemes of data placement like co-partitioning, single hypercubes (for multiwayjoins) or multiple hypercubes (for skewed data) gained some attention [3, 12, 30, 39, 41, 45].

Boek: Proceedingsbook 23rd International Conference on Database Theory (ICDT 2020)

Series: Leibniz International Proceedings in Informatics (LIPIcs)

Pagina's: 13:1 - 13:19

Aantal pagina's: 19

Jaar van publicatie:2020

Trefwoorden:tuple-generating dependencies, chase, conjunctive queries, distributed evaluation

Handle: http://hdl.handle.net/1942/33336
DOI: https://doi.org/10.4230/lipics.icdt.2020.13
ArticleNumber: 13

Toegankelijkheid:Open

Publicatie

Distribution Constraints: The Chase for Distributed Data

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

Auteurs/uitgever

Onderzoekseenheden

Evenementen