Prof. Dr. Felix Naumann


Stratosphere is a joint DFG project conducted by the Technische Universität Berlin, Humboldt Universität Berlin, and the Hasso-Plattner-Institut. It explores how the elasticity of clouds can be exploited for processing analytic queries massively in parallel. Unlike most traditional DBMS, Stratosphere inherently supports text-based and semi-structured data.

Official Project Site

The sub-projects at HPI focus on data quality improvements of linked open data, efficient and scalable data profiling, and knowledge discoevry.

Data Cleansing

We defined the declarative data cleansing language Meteor, implement the underlying basic operations, and develop cost estimations for the operations. Furthermore, we provide test data sets and example queries to evaluate the efficiency and effectivity of the data cleansing process.

Data Profiling

Detecting dependencies in the evergrowing amounts of data has a high computational complexity. One way to cope with this complexity is to distribute the computational work among multiple interconnected computers. However, most existing data profiling algorithms are not designed for parallel execution on computer clusters but rather to run on a single machine. Therefore, we research distributed modifications of existing algorithms as well as new algorithms that can be efficiently executed on computer clusters and that scale out on the number of the cluster nodes.

Knowledge Discovery

Driven by applications such as social media analytics, Web search, advertising, recommendation, mobile sensoring, genomic sequencing, astronomical observations, etc., the need for scalable learning, mining, and knowledge discovery methods is steadily growing. Often the challenge is to automatically process and analyze TBs of evolving data. Extracting value (e.g., understanding the underlying structure and making predictions) from such data, before it is outdated, is a major concern. Therefore, the goal is to enable the scalability of such applications based on Stratosphere.

Please contact Felix Naumann, Toni Grütze (Knowledge Discovery on Stratosphere), or Sebastian Kruse (Data Profiling on Stratosphere) for further questions.

Former members


Gruetze, Toni and Krestel, Ralf and Lazaridou, Konstantina and Naumann, Felix
In Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, 3-7 April, 2017, 2017 ACM.
Gruetze, Toni and Kasneci, Gjergji and Zuo, Zhe and Naumann, Felix
Web Semantics: Science, Services and Agents on the World Wide Web, vol. 37(C):75–89 3 2016
Gruetze, Toni and Krestel, Ralf and Naumann, Felix
In Proceedings of the 21st International Conference on Applications of Natual Language to Information Systems (NLDB), volume 9612 pages 213–221, 6 2016 Springer.
Thorsten Papenbrock, Arvid Heise, Felix Naumann
IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 27(5):1316-1329 2015
Astrid Rheinländer, Arvid Heise, Fabian Hueske, Ulf Leser, Felix Naumann
Information Systems, vol. 52:96-125 2015
Sebastian Kruse, Thorsten Papenbrock, Felix Naumann
In Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2015
Toni Gruetze, Gary Yao, Ralf Krestel
In Proceedings of the 24th International Conference on World Wide Web Companion (WWW), pages 1333–1338, 5 2015 ACM.
Tobias Vogel, Arvid Heise, Uwe Draisbach, Dustin Lange, Felix Naumann
JDIQ, vol. 5(1-2) 2014
Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinländer, Matthias J. Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, Daniel Warneke
The VLDB Journal, vol. 23(6):939-964 2014
Astrid Rheinländer, Martin Beckmann, Anja Kunkel, Arvid Heise, Thomas Stoltmann, Ulf Leser
In Proceedings of the SIGMOD conference, pages 685-688, 2014
Arvid Heise, Gjergji Kasneci, Felix Naumann
In Proceedings of the Conference on Information and Knowledge Management (CIKM), pages 959-968, 2014
Marcus Leich, Jochen Adamek, Moritz Schubotz, Arvid Heise, Astrid Rheinlander, Volker Markl
In Database Systems for Business, Technology, and Web (BTW), 2013
Arvid Heise, Jorge-Arnulfo Quiane-Ruiz, Ziawasch Abedjan, Anja Jentzsch, Felix Naumann
In Proceedings of the VLDB Endowment (PVLDB), Hangzhou, China, 2013 Jorge's presentation at VLDB 2014 was awarded the "Excellent Presentation Award".
Astrid Rheinländer, Arvid Heise, Fabian Hueske, Ulf Leser, Felix Naumann
Arvid Heise, Astrid Rheinländer, Marcus Leich, Ulf Leser, Felix Naumann
In Proceedings of the International Workshop on End-to-end Management of Big Data (BigData) in conjunction with VLDB 2012, Istanbul, Turkey, 2012
Christoph Böhm, Markus Freitag, Arvid Heise, Claudia Lehmann, Andrina Mascher, Felix Naumann, Mauricio Hernandez, Vuk Ercegovac, Peter Haase
In Proceedings of the International World Wide Web Conference (WWW), Lyon, France, 2012