Completed master's theses
Listed here are student theses since winter semester 2005/2006. All master's theses are available upon request as pdf files. Please contact office-naumann.
2026
| Topic | Student | Advisors | Last seen as/at |
| DV-Tree: A Multi-Version Index for Functional and Order Dependency Validation | Jonas Baltruschat | Felix Naumann, Daniel Lindner | |
| Cardinality Estimation with Data Dependencies | Paul Rößler | Felix Naumann, Daniel Lindner, Martin Boissier | Amazon, Berlin |
| Using Coordinated Probabilistic Filters to Reduce Intermediate Query Results | Tobias Jordan | Tilmann Rabl, Martin Boissier, Daniel Lindner | |
| Datenzentrische KI-Methoden zur Schadensvorhersage in Fahrzeugtransportnetzwerken | Finn Freiheit | Felix Naumann | |
| Incremental Order Dependency Discovery | Paul Sieben | Felix Naumann, Youri Kaminsky | |
| Incorporating Uncertainty in the Data Quality Assessment Process | Jannis Berndt | Lisa Ehrlinger, Felix Naumann |
2025
| Topic | Student | Advisors | Last seen as/at |
| ChunkQuest: Navigating the Multimodal Landscape for Retrieval-augmented Generation | David Kuska | Felix Naumann, Lukas Laskowsi, Francesco Pugnaloni | Machine Learning Engineer at Berliner Energie und Wärme |
| Detection of Subsequence Anomalies in Univariate Time Series with Convolutional Kernels | Stefan Spangenberg | Felix Naumann, Sebastian Schmidl | |
| Fully Unsupervised Entity Resolution with Fine-Tuned Language Embeddings | Florian Sold | Felix Naumann, Fabian Panse | Innovation Tech Lead at Horizon56 |
| Parameter-Efficient Domain Adaptation for Entity Matching | Emanuele De Rossi | Felix Naumann, Matteo Paganelli, Fabian Panse | |
| Meta-Data based Synthesis of Realistic Tabular Data using Pretrained Language Models | Philipp Hildebrandt | Felix Naumann, Fabian Panse | PhD student at HPI |
2024
| Topic | Student | Advisors | Last seen as/at |
| The Effects of Data Quality on Named Entity Recognition Published at WNUT@EACL24 (Best paper award) | Divya Bhadauria (Universität Potsdam) | Felix Naumann, Ralf Krestel, Alejandro Sierra Múnera | PhD student at HPI |
| Mining Larger Persistent Patterns in Temporal Networks | Jakob Edding (HPI) | Felix Naumann, Tobias Bleifuß |
2023
| Topic | Student | Advisors | Last seen as/at |
| NumbER - Entity Resolution on Numerical Data | Lukas Laskowski | Fabian Panse, Felix Naumann | PhD student at HPI |
| Correlation Anomaly Detection in High-Dimensional Time Series | Niklas Köhnecke | Schmidl/Wenig, Papenbrock, Naumann | TNG Technology Consulting |
| Efficient Discovery of Conditional Unique Column Combinations | Torben Meyer | Youri Kaminsky, Felix Naumann | bakdata |
| Semantic Column Type Detection Using Pretrained Language Models | Jonathan Haas | Felix Naumann, Hazar Harmouch | |
| Quantification of Diversity in Structured Data | Lisa Koeritz | Felix Naumann, Hazar Harmouch | Researcher at Internationales Zentrum für Ethik in den Wissenschaften (IZEW) |
| Leveraging Data Histories to Improve Entity Resolution | Sebastian Apitz | Felix Naumann, Leon Bornemann | kontura GmbH |
2022
| Topic | Student | Advisors | Last seen as/at |
| Estimating Population-Based Completeness in Web Tables | Johanna Schwinn (Universität Potsdam) | Naumann | |
| Der Einfluss von Datenqualitätsmängeln auf Fairness in KI-Systemen | Isabel Bär | Naumann | |
| HYPEX: Explainable Hyperparameter Optimization in Time Series Anomaly Detection published at BTW 2023 | Mats Pörschke | Schmidl, Wenig, Naumann, Papenbrock | Software Engineer at Apple Heidelberg |
| Mona L. = M. Lisa = La Gioconda? Jointly Linking Entities and Extracting Relations with BERT | Lucas Silbernagel (HWR) | Jain, Naumann, Schaal | Analytics Engineer at Knowunity |
| Discovering Similarity Inclusion Dependencies published at SIGMOD 2023 | Youri Kaminsky | Naumann, Pena | PhD student at HPI |
| Discovery of Complementation Dependencies | Jonas Hering | Naumann, Harmouch | Software Engineer at voize |
| Time Series Anomaly Detection: An Aircraft Turbine Case Study | Jacopo Roberto Nicosia (University of Milano-Bicocca) | Schmidl, Wenig, Papenbrock | |
| Efficient Ultrafine-Grained Typing of Named Entities published at JCDL 2023 | Jan Westphal | Naumann, Krestel, Sierra Múnera | ML Engineer at voize |
Workload-Driven Query Optimization Using Data Dependencies | Daniel Lindner | Felix Naumann, Hasso Plattner, Michael Perscheid, Jan Koßmann, Marcel Weisgut | PhD student at HPI |
2021
| Topic | Student | Advisors | Last seen as/at |
| Diachronic Alignment of Induced Word Senses | Robert Schwanhold | Krestel, Repke | SAP Innovation Center |
| Inferring Regular Expressions from Database Columns | Tobias Niedling | Harmouch, Naumann | |
| Extracting Tables From Plain Text Published at BTW 23 | Leonardo Hübscher | Naumann, Jiang | IT Consultant at Netlight |
| Multi-Aspect Embeddings for Fiction Novels Published at WI 21 | Lasse Kohlmeyer | Krestel, Repke | Capgemini |
| Distributed Duplicate Detection on Streaming Data | Jakob Köhler | Papenbrock | |
Continuous Learning for Hate Speech Detection | Sezi Dwi Sagarianti Prasetyo (Uni Potsdam) | Krestel |
|
| Generating Rap Lyrics with Flow and Rhythm | Noel Danz | Krestel, Repke |
2020
| Topic | Student | Advisors | Last seen as/at |
| Actor based Database System for Analytical Queries Published at BTW 2021 | Julian Weise | Papenbrock | Snowflake |
| Distributed Graph Based Approximate Nearest Neighbor Search | Juliane Waack | Papenbrock | Snowflake |
| Efficient Distributed Discovery of Bidirectional Order Dependencies Published in The VLDB Journal | Sebastian Schmidl | Papenbrock | HPI |
| Distributed Detection of Sequential Anomalies in Time Related Data Sequences Published in The VLDB Journal | Johannes Schneider | Papenbrock, Wenig | SAP |
| Enriching Document Embeddings With Domain Knowledge Published at NAACL 2021 | Philipp Hager | Krestel, Risch | PhD student at Amsterdam University |
| Distributed Unique Column Combination Discovery | Benjamin Feldmann | Papenbrock, Thomas Bläsius, Martin Schirneck | bakdata |
| Multi-Prototype Diachronic Word Embedding | Maxi Fischer | Krestel, Repke | Sopra Steria |
| Inclusion Dependency Discovery on Streaming Data | Alexander Preuss | Papenbrock | bakdata |
| Reactive Inclusion Dependency Discovery | Frederic Schneider | Papenbrock | Netlight, Berlin |
2019
| Topic | Student | Advisors | Last seen as/at |
Context-aware Classification of News Comments | Johannes Filter | Krestel, Risch | AlgorithmWatch |
| Jointly Learning Document and Label Embeddings for Hierarchically Labeled Text Published at JCDL | Samuele Garda (Uni Potsdam) | Krestel, Risch | HU Berlin |
Improving Stock Price prediction using News Articles and Company Networks | Thomas Kellermeier | Krestel, Repke | neXenio |
| Modeling News Commenters for Discussion Recommendation Published at WI | Victor Künstler | Krestel, Risch | bakdata |
| Finding Related Tables on the Web | Fabian Windheuser | Naumann, Harmouch | Palantir |
| Wikipedia Table Layout Detection and Standardization | Martin Zabel | Bleifuß, Bornemann, Naumann | Solutiance |
2018
| Topic | Student | Advisors | Last seen as/at |
| Automatically Managing News Comments Published at NAACL 2018 | Carl Ambroselli | Risch, Krestel | Palantir |
| Efficient Discovery of Matching Dependencies Published at TODS | Philipp Schirmer | Papenbrock, Naumann | bakdata |
| Discovering Conditional Functional Dependencies | Maximilian Grundke | Papenbrock, Naumann | |
| Efficient Detection of Genuine Approximate Functional Dependencies | Moritz Finke | Papenbrock, Naumann, Claudia Lehmann (SAP) | |
Text Data Generation for Data Profiling Use Cases | Jennifer Stamm | Naumann, Papenbrock | SAP |
2017
| Topic | Student | Advisors | Last seen as/at |
| Uncertain Estimates in Cross-Platform Plan Optimization | Jonas Kemper | Kruse, Naumann | Mckinsey & Company |
| Temporal Outlier Detection in Reverse Vending Machine Data | Mariya Perchyk | Naumann, Zuo | Netlight |
| Detection of Inappropriate Content in Online Comments | Dustin Gläser | Krestel | Solutiance Systems GmbH |
| Multivalued Dependency Discovery | Tim Draeger | Papenbrock, Naumann | Cognizant |
| Improving Probabilistic Topic Models using Word Embeddings Published at JCDL 2018 | Stefan Bunk | Krestel | Merantix |
| Large-Scale topic-based Analysis of political Discussions on Twitter | Jaqueline Pollak | Lazaridou, Grütze, Naumann | Seven Principles AG |
| Focused Crawling for Record Completion | Daniel Neuschäfer-Rube | Koumarelas, Naumann |
2016
| Topic | Student | Advisors | Last seen as/at |
Entropy-Based Topic Modeling for Multiple Domain-Specific Text Collections | Julian Risch | Krestel | HPI |
| Efficient Denial Constraint Discovery Published at VLDB 2018 | Tobias Bleifuß | Kruse, Naumann | HPI |
| Clozing the Gap: Knowledge Base Population by Answering Cloze Queries | Thomas Werkmeister | Krestel, Weissenborn (DFKI) | |
| From text to facts: Relation extraction on German company websites | Tanja Bergmann | Loster, Naumann | Rasa |
| German Organization Name Part Classification | Manuel Hegner | Loster, Naumann | bakdata |
| Classification of German Newspaper Comments Published at LWDA 2016 | Christian Godde | Krestel, Lazaridou | |
| Modeling Binding Preferences of RNA-binding Proteins with Hidden Markov Models Published in Nucleic Acids Research | David Heller | Krestel, Marsico (MPI MolGen) | MPI MolGen |
| Context-based Tweet Recommendation for News Articles | Alexander Spivak | Krestel, Gruetze | Capgemini, Berlin |
Data Profiling Benchmark and Tool Assessment | Johannes Eschrig | Naumann | SAP |
2015
| Topic | Student | Advisors | Last seen as/at |
| Quicker Ways of Doing Fewer Things: Improved Index Structures and Algorithms for Data Profiling | Jakob Zwiener | Kruse, Naumann | Google, Zurich |
| Dialog Act Recognition in Twitter Conversations Published in parts at SIGDIAL 2015 | Elina Zarisheva | Stede, Naumann | |
| Profiling Log Messages for Unknown Error Detection | Lukas Schulze | Naumann, Jenders, Oelmüller (ePost) | |
| Efficient Order Dependency Detection Published in VLDB Journal | Philipp Langer | Naumann | IBM, Böblingen |
| Spinning a Web of Tables through Inclusion Dependencies Published in Transactions on Database Systems (TODS) | Fabian Tschirschnitz | Papenbrock, Naumann | SAP |
| Online Temporal Summarization of News Events Published at WI 2015 | Tobias Schubotz | Krestel, Jenders | dubsmash |
| Discovery of Conditional Unique Column Combinations | Jens Ehrlich | Papenbrock, Naumann | IVU Traffic Technologies AG |
| A Topic-Based Search for Microblog Posts Published at LWA 2015 | Mandy Roick | Krestel, Jenders | Audentia Management Consulting GmbH |
| Scanpath Comparison for Visual Search Analysis | Markus Hinsche | Kasneci, Naumann | Artory, Berlin |
2014
| Topic | Student | Advisors | Last seen as/at |
| Estimating Metadata of Query Results using Histograms | Cathleen Ramson | Naumann, Kruse | IVU Traffic Technologies AG |
| Using Twitter for Politician Tracking: Following our Leader's Paths | Jan Rehwaldt | Naumann, Kasneci | |
| Optimization and Parallelization of Foodborne Disease Outbreak Analyses Winner of the 2015 TDWI Award | Markus Freitag | Naumann, Filter (BfR) | eitco |
| A Content-Based Serendipity Model for News Recommendation Published at KI 2015 | Thorben Lindhauer | Kasneci/Jenders | camunda services GmbH |
| Large-Scale Twitter Hashtag Recommendation for Documents Published at TempWeb 2015 | Gary Yao | Kasneci/Grütze | Zalando |
Optimizing Performance of Linked Data Profiling (code) | Benedikt Forchhammer | Naumann/Jentzsch | Independent software developer |
| Text Profiling: Aggregation Analyses on Sets of Texts | Matthias Kohnen | Naumann | Software Architect at SAP |
| Automatische Generierung eines Doktorvater-Stammbaumes | Thomas Kaske | Naumann | |
| Depth-first Discovery of Functional Dependencies Published at CIKM 2014 - Winner of the Best Student Paper Award | Patrick Schulze | Naumann/Adedjan | Consultant for Neon Roots |
| Estimating the Complexity and Effort of Data Integration Published at EDBT 2015 | Sebastian Kruse | Naumann/Papotti (QCRI) | HPI |
| Discovery of Strong- and Weak-Unique Column Combinations in Datasets | Claudia Lehmann | Naumann/Heise | SAP, Palo Alto |
| Discovering Matching Dependencies | Andrina Mascher | Naumann/Papenbrock | Signavio |
2013
| Topic | Student | Advisors | Last seen as/at |
Entwicklung einer Experten-Suchmaschine | Stefan George | Kasneci | |
| Iterative Data Cleansing | Tobias Rawald | Naumann, Heise | |
| Incremental Data Profiling | Sven Viehmeier | Naumann, Abedjan | |
| Progressive Duplicate Detection Published as TKDE article | Thorsten Papenbrock | Naumann, Heise | HPI, Wissenschaftlicher Mitarbeiter |
| Optimierung regelbasierter Duplikaterkennung | Florian Thomas | Naumann, Draisbach | |
| Legislatum - Gesetzessuchmaschine für Laien | Stephan Wehrmeyer | Naumann | |
| Strategies for structure-based rewriting of SPARQL queries for data prefetching | Armin Zamani | Naumann, Lorey | SAP |
| Produktduplikaterkennung und Titelfusion | Robert Aschenbrenner | Naumann |
2012
| Topic | Student | Advisors | Last seen as/at |
| Manuelle Duplikaterkennung mittels Crowdsourcing Second place in German Best Master Degree Award by DGIQ | David Wenzel | Vogel, Naumann | Capgemini |
| Context-aware Recommendations in Social Networks | Benjamin Emde | Abedjan, Naumann | GoEuro |
| Generating Query Suggestions by Exploiting Latent Semantics in Query Log | Fabian Lindenberg | Momtazi, Naumann | Ecotastic |
| Analyzing and Predicting Viral Tweets Published at MSND'13 workshop | Maximilian Jenders | Kasneci | PhD-Student, HPI |
| Erweiterung und Optimierung eines Graph-Clustering-Verfahrens | Eyk Kny | Böhm, Naumann | SAP Innovation Center Potsdam |
| Email Classification with Contextual Information | Michael Leben | Momtazi, Naumann | |
| Summarizing Extract-Transform-Load Workflows | MinhTuan Nguyen | Naumann, Albrecht | Software Developer, Otto Group |
| Automatic Data Normalization Using Pattern-Based Repairs | Sebastian Kölle | Naumann, Heise |
2011
| Topic | Student | Advisors | Last seen as/at |
| Effiziente Ähnlichkeitssuche in einer großen Menge von Zeichenketten mittels Key-Value-Store Published at SSDBM 2012 | Dandy Fenz | Lange, Naumann | Developer at madvertise Mobile Advertising GmbH |
| Automatisierte Konfiguration des D-Indexes zur Ähnlichkeitssuche | Matthias Pohl | Lange, Naumann | cpi GmbH, Berlin |
| HDRS: A Scalable Peer-to-Peer RDF Storage Infrastructure for Hadoop Published at I-Semantics 2012 | Daniel Hefenbrock | Böhm, Naumann | Developer at Microsoft |
| Überlappendes Clustering von Konzepten | Johannes Gosda | Naumann, Böhm | Developer at Pass Consultion |
| Advanced Service Discovery: Beyond Full-text Search | Thomas Berger | AbuJarour, Naumann | Inubit AG, Berlin |
| Concept matching in the Web of Data Published at LDOW 2012 | Toni Grütze | Böhm, Naumann | Wissenschaftlicher Mitarbeiter, HPI |
2010
| Topic | Student | Advisors | Last seen as/at |
| Duplikaterkennung unter Verwendung unstrukturierter Anteile | David Sonnabend | Naumann, AbuJarour, Vogel | lead on GmbH |
| Extraction of Management Concepts from Web Sites for Sentiment Analysis | Arvid Heise | Naumann, Walgenbach (Jena) | Research Assistant, HPI |
| Wikipedia cross-lingual Concept Identification and Infobox Alignment First place in IQ Best Master Degree Contest of dgiq Published in Information Systems journal | Daniel Rinser | Naumann, Lange | |
| Optimizing query execution to improve the energy efficiency of database management systems | Tobias Flach | Naumann | PhD Student at UCSC |
| Discovering unique column combinations within a database Published at CIKM 2011 | Ziawasch Abedjan | Naumann | PostDoc, MIT |
| A flexible index structure for interactive data profiling | Ingmar Rötzler | Böhm, Naumann | |
| ETL process recommendation | Andriy Vedrych | Naumann, Albrecht |
2009
| Form | Topic | Student | Advisors | Last seen as/at |
| Master's thesis (extern FernUni Hagen) | Partitionierung zur effizienten Duplikaterkennung in relationalen Daten First place in IQ Best Master Degree Contest of dgiq Published at QDB 2009 | Uwe Draisbach | Naumann, Schlageter (U Hagen) | Research Assistant, HPI |
| Master's thesis | Mining Webservices for Metadata | Martin Probst | Kaufer, Naumann | Senior Software Engineer, EMC |
| Master's thesis | Conception and Development of a user-validated repository for user research information | Alexander Renneberg | Naumann | Consultant, d-labs |
| Diplomarbeit (extern HU Berlin) | Nutzung von Statistiken über Daten-Overlap zur Anfrageoptimierung in Peer Data Management Systemen | Veronique Tietz | Roth, Naumann | |
| Master's thesis | Entwicklung von Sonderprofilen für die effektive Duplikaterkennung in der SCHUFA Personendatenbank | Christin Koitschka | Bleiholder, Weis, Naumann | Deutsche Bank |
| Diplomarbeit (extern HU Berlin) | Merging Extract, Transform, Load Processes | Karsten Draba | Albrecht, Leser, Naumann | |
| Master'sSenior Software Developer at engageSPARK |
2008
| Form | Topic | Student | Advisors | Last seen as/at |
| Master's thesis | Parallelisierung von Graphduplikaterkennung Published in TKDE 2011 | Maik Taubert | Weis, Szott, Naumann | Biotronik GmbH |
2007
| Form | Topic | Student | Advisors | Last seen as/at |
| Diplomarbeit (extern, HU Berlin) | Automatisiertes Auffinden von Präfix- und Suffix-Inklusionsabhängigkeiten in relationalen Datenbankmanagementsystemen First place in IQ Best Master Degree Contest of dgiq | Jan Hegewald | Leser, Naumann, Bauckmann | Teamleiter bei idealo internet GmbH |
| Diplomarbeit (extern, Uni Halle) | XML Schema Matching unter Verwendung der Tree Edit Distance | Sascha Szott | Naumann, Brass (U Halle), Weis, Kaufer | Zuse Institut (ZIB), Berlin |
2005 – 2006 at Humboldt Universität
| Form | Topic | Student | Advisors | Last seen as/at |
| Diplomarbeit (extern HU Berlin) | Entwurf eines Peer Data Management Systems mit Steuerungs- und Simulationskomponente | Martin Schweigert | Naumann | IVU Traffic Tech. AG |
| Diplomarbeit (extern HU Berlin) | Entwicklung einer Testumgebung für ein Peer Data Management System | Tobias Hübner | Naumann | ID-Berlin GmbH |
| Diplomarbeit (extern HU Berlin) | Tree-Edit-Distance based Schema Matching | Evgenia Ershova | Naumann | |
| Diplomarbeit (extern HU Berlin) | Schemaintegration auf der Grundlage von Schema-Mappings | Jana Bauckmann | Naumann | |
| Diplomarbeit (extern HU Berlin) | Entwicklung von Klassifikation von Schema Mappings und deren Anwendung in einer Portallösung | Jens Harzer | Naumann | Wessendorf Software & Consulting GmbH |
| Diplomarbeit (extern HU Berlin) | Duplikaterkennung in XML Daten mit der Sorted Neighborhood Methode Published at EDBT 2006 | Sven Puhlmannr | Naumann | Center for Language Technology - Macquarie University, Australia |
| Diplomarbeit (extern HU Berlin) | Kombination von Schema Matching Verfahren zur semi-automatischen Integration von Fahrzeug-Daten | Lenka Ivantysynova | Naumann | Graduierten- kolleg "Verteilte Informationssys." |
| Diplomarbeit (extern HU Berlin) | Datentransformation mittels Schema Mapping Published at BTW 2007 | Frank Legler | Naumann | IBM Forschung |