The Rheem project has been selected for a demo presentation at the SIGMOD 2016 (abstract at the bottom). The submission is titled "Rheem: Enabling Multi-Platform Task Execution" and has been authored by Sebastian Kruse and the Data Analytics group at the Qatar Computing Research Institute. Because "one size does not fit all" is a limitation of virtually all data processing platforms, the Rheem project researches how to translate and optimize platform-agnostic data analytics programs for execution on (a combination of) diverse platforms, such as Apache Spark, relational databases, and machine learning engines. Find more details on the project page.
Abstract: Many emerging applications, from domains such as healthcare and oil & gas, require several data processing systems for complex analytics. This demo paper showcases Rheem, a framework that provides multi-platform task execution for such applications. It features a three-layer data processing abstraction and a new query optimization approach for multi-platform settings. We will demonstrate the strengths of Rheem by using real-world scenarios from three different applications, namely, machine learning, data cleaning, and data fusion.