Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
  
 

Publications

We try to keep an up to date list of all our publications. If you are interested in a PDF that we have not uploaded yet, feel free to send us an email to get a copy. All recent publications you will find below. For older, please click appropriate year.

Publications of the years 2020, 2019, 2018, 20172016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007

Big Data Generation

Rabl, Tilmann; Jacobsen, Hans-Arno in Specifying Big Data Benchmarks - First Workshop, WBDB 2012, San Jose, CA, USA, May 8-9, 2012, and Second Workshop, WBDB 2012, Pune, India, December 17-18, 2012, Revised Selected Papers Seite 20-27 . 2012 .

Big data challenges are end-to-end problems. When handling big data it usually has to be preprocessed, moved, loaded, processed, and stored many times. This has led to the creation of big data pipelines. Current benchmarks related to big data only focus on isolated aspects of this pipeline, usually the processing, storage and loading aspects. To this date, there has not been any benchmark presented covering the end-to-end aspect for big data systems. In this paper, we discuss the necessity of ETL like tasks in big data benchmarking and propose the Parallel Data Generation Framework (PDGF) for its data generation. PDGF is a generic data generator that was implemented at the University of Passau and is currently adopted in TPC benchmarks.
Weitere Informationen
TagsWBDB