Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
  
 

Publications

We try to keep an up to date list of all our publications. If you are interested in a PDF that we have not uploaded yet, feel free to send us an email to get a copy. All recent publications you will find below. For older, please click appropriate year.

Publications of the years 2020, 2019, 2018, 20172016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007

ScootR: Scaling R Dataframes on Dataflow Systems

Kunft, Andreas; Stadler, Lukas; Bonetta, Daniele; Basca, Cosmin; Meiners, Jens; Breß, Sebastian; Rabl, Tilmann; Fumero, Juan José; Markl, Volker in Proceedings of the ACM Symposium on Cloud Computing, SoCC 2018, Carlsbad, CA, USA, October 11-13, 2018 Seite 288-300 . 2018 .

To cope with today’s large scale of data, parallel dataflow enginessuch as Hadoop, and more recently Spark and Flink, have beenproposed. They offer scalability and performance, but require datascientists to develop analysis pipelines in unfamiliar programminglanguages and abstractions. To overcome this hurdle, dataflow en-gines have introduced some forms of multi-language integrations,e.g., for Python and R. However, this results in data exchange be-tween the dataflow engine and the integrated language runtime,which requires inter-process communication and causes high run-time overheads. In this paper, we present ScootR, a novel approachto execute R in dataflow systems. ScootR tightly integrates thedataflow and R language runtime by using the Truffle frameworkand the Graal compiler. As a result, ScootR executes R scripts di-rectly in the Flink data processing engine, without serialization andinter-process communication. Our experimental study reveals thatScootR outperforms state-of-the-art systems by up to an order ofmagnitude.
Weitere Informationen
TagsSoCC