Hasso-Plattner-Institut
Prof. Dr. h.c. mult. Hasso Plattner
 

The Enterprise Stream Processing Benchmark

Motivation

Why another benchmark?

The ever increasing amount of data that is produced nowadays, from smart homes to smart factories, gives rise to completely new challenges and opportunities. Terms like "Internet of Things" (IoT) and "Big Data" have gained traction to describe the creation and analysis of these new data masses. Furthermore, new technologies and systems were developed that are able to handle and analyze data streams, i.e., data arriving with high frequency and in large volume.

In recent years, e.g., a lot of distributed Data Stream Processing Systems were developed, whose usage represents one way of analyzing data streams. Although a broad variety of systems or system architectures is generally a good thing, the bigger the choice, the harder it is to choose.

Benchmarking is a common and proven approach to identify the best system for a specific set of needs. However, currently, no satisfying benchmark for modern data stream processing architectures exists. Particularly when  an enterprise context, i.e., where data streams have to be combined with historical and transactional data, existing benchmarks have shortcomings. The Enterprise Streaming Benchmark (ESB),  which is to be developed, will attempt to tackle this issue.  

Objectives

We aim to create a relevant, real-world application benchmark with a focus on data stream processing architectures in an enterprise context. This involves including business or transactional data stored in a traditional database into the analysis process. In order to ease usage of the developed benchmark, we will develop a comprehensive toolkit for supporting implementation as well as setup and execution. Particularly, it will comprise tools for system setup, data ingestion, data validation and benchmark result analysis.

With regard to the domain, we will focus on an industrial manufacturing context. So the defined benchmark queries will answer questions in that area. Our primary data source will be sensor data from, e.g., industrial machinery. Therefore, we aim to work with real-world data. 

Selected Publications

  • ESPBench: The Enterprise ... - Download
    ESPBench: The Enterprise Stream Processing Benchmark. Hesse, Guenter; Matthies, Christoph; Perscheid, Michael; Uflacker, Matthias; Plattner, Hasso (2021). 201–212.
     
  • How Fast Can We Insert? A... - Download
    How Fast Can We Insert? An Empirical Performance Evaluation of Apache Kafka. Hesse, Guenter; Matthies, Christoph; Uflacker, Matthias (2021). 641–648.
     
  • Application of Data Strea... - Download
    Application of Data Stream Processing Technologies in Industry 4.0: What is Missing?. Hesse, Guenter; Sinzig, Werner; Matthies, Christoph; Uflacker, Matthias (2019). 304–310.
     
  • Quantitative Impact Evalu... - Download
    Quantitative Impact Evaluation of an Abstraction Layer for Data Stream Processing Systems. Hesse, Guenter; Matthies, Christoph; Glass, Kelvin; Huegle, Johannes; Uflacker, Matthias (2019). 1381–1392.
     
  • Adding Value by Combining... - Download
    Adding Value by Combining Business and Sensor Data: An Industry 4.0 Use Case. Hesse, Guenter; Matthies, Christoph; Sinzig, Werner; Uflacker, Matthias G. Li, J. Yang, J. Gama, J. Natwichai, Y. Tong (reds.) (2019). (Vol. 11448) 528–532.
     
  • Senska - Towards an Enter... - Download
    Senska - Towards an Enterprise Streaming Benchmark. Hesse, Guenter; Reissaus, Benjamin; Matthies, Christoph; Lorenz, Martin; Kraus, Milena; Uflacker, Matthias (2018). 25–40.
     
  • A New Application Benchma... - Download
    A New Application Benchmark for Data Stream Processing Architectures in an Enterprise Context: Doctoral Symposium. Hesse, Guenter; Matthies, Christoph; Reissaus, Benjamin; Uflacker, Matthias in DEBS ’17 (2017). 359–362.
     
  • Conceptual Survey on Data... - Download
    Conceptual Survey on Data Stream Processing Systems. Hesse, Guenter; Lorenz, Martin in IEEE International Conference on Parallel and Distributed Systems (ICPADS) (2015). 797–802.