Analysis of TPCx-IoT: The First Industry Standard Benchmark for IoT Gateway Systems. Poess, Meikel; Nambiar, Raghunath; Kulkarni, Karthik; Narasimhadevara, Chinmayi; Rabl, Tilmann; Jacobsen, Hans-Arno (2018). 1519–1530.
By 2020 it is estimated that 20 billion devices will be connected to the Internet. While the initial hype around this Internet of Things (IoT) stems from consumer use cases, the number of devices and data from enterprise use cases is significant in terms of market share. With companies being challenged to choose the right digital infrastructure from different providers, there is an pressing need to objectively measure the hardware, operating system, data storage, and data management systems that can ingest, persist, and process the massive amounts of data arriving from sensors (edge devices). The Transaction Processing Performance Council (TPC) recently released the first industry standard benchmark for measuring the performance of gateway systems, TPCx-IoT. In this paper, we provide a detailed description of TPCx-IoT, mention design decisions behind key elements of this benchmark, and experimentally analyze how TPCx-IoT measures the performance of IoT gateway systems.
PolyBench: The First Benchmark for Polystores. Karimov, Jeyhun; Rabl, Tilmann; Markl, Volker (2018). 24–41.
Modern business intelligence requires data processing not only across a huge variety of domains but also across different paradigms, such as relational, stream, and graph models. This variety is a challenge for existing systems that typically only support a single or few different data models. Polystores were proposed as a solution for this challenge and received wide attention both in academia and in industry. These are systems that integrate different specialized data processing engines to enable fast processing of a large variety of data models. Yet, there is no standard to assess the performance of polystores. The goal of this work is to develop the first benchmark for polystores. To capture the flexibility of polystores, we focus on high level features in order to enable an execution of our benchmark suite on a large set of polystore solutions.
The Berlin Big Data Center (BBDC). Boden, Christoph; Rabl, Tilmann; Markl, Volker in it-Information Technology (2018). 60(5-6) 321–326.
The last decade has been characterized by the collection and availability of unprecedented amounts of data due to rapidly decreasing storage costs and the omnipresence of sensors and data-producing global online-services. In order to process and analyze this data deluge, novel distributed data processing systems resting on the paradigm of data flow such as Apache Hadoop, Apache Spark, or Apache Flink were built and have been scaled to tens of thousands of machines. However, writing efficient implementations of data analysis programs on these systems requires a deep understanding of systems programming, prohibiting large groups of data scientists and analysts from efficiently using this technology. In this article, we present some of the main achievements of the research carried out by the Berlin Big Data Cente (BBDC). We introduce the two domain-specific languages Emma and LARA, which are deeply embedded in Scala and enable declarative specification and the automatic parallelization of data analysis programs, the PEEL Framework for transparent and reproducible benchmark experiments of distributed data processing systems, approaches to foster the interpretability of machine learning models and finally provide an overview of the challenges to be addressed in the second phase of the BBDC.