Stream processing systems have gained popularity due to developments such as Internet of Things or Industry 4.0. Enterprises benefit from this technology by augmenting their business data with up-to-date streaming information in a stream processing architecture. The performance characteristics of such an architecture, e.g., latency or throughput, vary between data stream processing systems, system configurations, hardware setups, or further environmental differences. A common way of analyzing and evaluating a system's performance is benchmarking. Despite the increasing variety of stream processing frameworks, there is a lack of a satisfying benchmark. This is especially true for the enterprise context, i.e., when combining business or historical data with streaming data. My research investigates how a performance benchmark for streaming architectures should be designed and implemented.
Particularly, it presents a novel benchmark for enterprise streaming architectures. This includes the design of a benchmark process that ensures objective comparisons. As part of the benchmark, we developed a comprehensive toolkit with a focus on usability, which supports the benchmark execution and calculates benchmark results independently of the used stream processing system. The process design also comprises the definition of real-world benchmark queries that are validated in industry and which need to be implemented on the system under test. These queries cover the core functionalities of stream processing systems that are defined in literature. To validate the proposed benchmark, we will analyze selected systems, Apache Flink and Apache Spark Streaming being two of them.
Keywords: Data Stream Processing, Performance Benchmarking, Industry 4.0, and Internet of Things