< Back to previous page

Project

Timing issues in distributed database stream processing. (R-5472)

A database stream is a continuously growing sequence of data items. For example, if the input stream consists of road congestion measurements over time, a traffic management system can issue dynamic speed limits in an output stream. Implementing stream processing systems is challenging, because often incoming data can not be stored due to its high velocity and high volume. Instead, on the fly processing is used, where only data items in the same time window are correlated, e.g., when their time differs by at most ten seconds. Also, the system is often distributed, to share the load among parallel computing nodes. Many languages and distributed execution mechanisms exist for stream processing. Now, the programmer often expects some maximum delay on the output items. We call this the desired output timing. Unfortunately, execution mechanisms are subject to internal system delays caused by hardware, node scheduling policies, and by asynchronous messages in distributed settings. Such delays may cause the actual output timing to be often late compared to the desired output timing. Timing can even degrade, and outputs are produced slower and slower. Because this timing mismatch is not well understood, we aim for a natural theory to clarify how good or how bad the actual output timing of a system is compared to the desired output timing. Concretely, by defining approximations of the desired output timing, we seek practical insights between system speed and timing accuracy.
Date:1 Oct 2014 →  31 Aug 2016
Keywords:DATABASE THEORY
Disciplines:Applied mathematics in specific fields, Artificial intelligence, Cognitive science and intelligent systems