< Back to previous page

Project

A formal approach to querying big data (R-4633)

Over the recent years "big data" became a prominent buzzword. The term "big data" generally refers to a context where Gigabytes constitute the unit size for measuring data volumes, where Terabytes are commonly encountered, and many Web companies, scientific or financial institutions must deal with Petabytes of information. In response to pressing practical needs a variety of systems arose for handling big data, like e.g., MapReduce as introduced by Google and which gained widespread adaption through the open source implementation named Hadoop, together with numerous proposals for extensions. The latter also lead to a revival of parallel database management systems and an increased interest in so-called NoSQL data stores. While progress in database research has lead to a deep understanding of traditional non-distributed data models and sequential querying, a similar understanding for big data computation is missing and initial models are only scarcely developed. Given the number of competing systems and their diversity, it remains unclear which system is best suited for which kind of queries. This work therefore aims to develop and study computational models for big data to provide insight in the use of existing systems and to formulate possible improvements. This research proposal has two main objectives: (1) development of a computational complexity suited for big data; and (2) development and study of query and transformation languages for big data within this framework.
Date:1 Oct 2013 →  30 Sep 2017
Keywords:SCIENTIFIC DATA MANAGEMENT
Disciplines:Applied mathematics in specific fields