Computer systems up to the turn of the century became constantly faster without any particular effort, simply because the hardware they were running on increased its clock speed with every new release. But this free lunch is over! Today's CPUs stall at around 3 GHz and software developers need to break new grounds to make their products faster. The most popular approach for this is to design software with parallelization and distributed computing in mind because the number of computing elements (transistors, cores, CPUs, GPUs, cluster nodes etc.) in modern computer systems still increases constantly.
Big Data analytics and engineering are both multi-million dollar market that grow constantly. Data and the ability to control and use it is the most valuable ability of today's computer systems. Because data volumes grow so rapidly and with them the complexity of questions they should answer, data engineering, which is the process of shaping and transforming data, as well as data analytics, which is the ability of extracting any kind of information from the data, both become increasingly difficult. Both data-centric computer science disciplines can, in particular, not hope for the hardware getting any faster to cope with their performance problems: They need to embrace new software trends that let their performance scale with the still increasing number of processing elements.
This general paradigm shift in software development, however, introduces various challenges that must be solved to develop an algorithm or system that efficiently executes on various, possibly independent and heterogeneous computing elements. Some of these challenges involve the following questions:
How can the distributed algorithm or system …
- utilize all available resources in an optimal way?
- deal with the increased error susceptibility of a parallel/distributed system?
- support elasticity, i.e., sets of computing resources that change at runtime?
- control its resource consumption in terms of overall memory and CPU usage?
- ensure reliable state and data storage?
- start and terminate in a clean and secure way?
- be debugged, monitored, and profiled?
Certain frameworks for parallel/distributed programming, such as Spark, Flink, and Storm, solve a couple of these questions already, but they enforce a certain programming model that does not fit for all computational complex tasks. Other distributed computing paradigms, such as message passing and actor programming (see, for instance, Akka, Orleans, or Erlang), leave these questions to the programmer, but they also offer much more flexibility for algorithmic designs.
In this research area, we investigate various data engineering and data analytics domains to identify and then solve their computationally complex problems via scalable and elastic approaches. We investigate general challenges for writing distributed systems, but also try to solve use-case-specific computational tasks that have no trivial distributed solutions yet.