Completed master's theses

Listed here are student theses since winter semester 2005/2006. All master's theses are available upon request as pdf files. Please contact office-naumann.

2026

TopicStudentAdvisorsLast seen as/at
DV-Tree: A Multi-Version Index for Functional and Order Dependency ValidationJonas BaltruschatFelix Naumann, Daniel Lindner 
Cardinality Estimation with Data DependenciesPaul RößlerFelix Naumann, Daniel Lindner, Martin BoissierAmazon, Berlin
Using Coordinated Probabilistic Filters to Reduce Intermediate Query ResultsTobias JordanTilmann Rabl, Martin Boissier, Daniel Lindner 
Datenzentrische KI-Methoden zur Schadensvorhersage in FahrzeugtransportnetzwerkenFinn FreiheitFelix Naumann 
Incremental Order Dependency DiscoveryPaul SiebenFelix Naumann, Youri Kaminsky 
Incorporating Uncertainty in the Data Quality Assessment ProcessJannis BerndtLisa Ehrlinger, Felix Naumann 

 

 

2025

TopicStudentAdvisorsLast seen as/at
ChunkQuest: Navigating the Multimodal Landscape for Retrieval-augmented GenerationDavid KuskaFelix Naumann, Lukas Laskowsi, Francesco PugnaloniMachine Learning Engineer at Berliner Energie und Wärme
Detection of Subsequence Anomalies in Univariate Time Series with Convolutional KernelsStefan SpangenbergFelix Naumann, Sebastian Schmidl 
Fully Unsupervised Entity Resolution with Fine-Tuned Language EmbeddingsFlorian SoldFelix Naumann, Fabian PanseInnovation Tech Lead at Horizon56
Parameter-Efficient Domain Adaptation for Entity MatchingEmanuele De RossiFelix Naumann, Matteo Paganelli, Fabian Panse 
Meta-Data based Synthesis of Realistic Tabular Data using Pretrained Language ModelsPhilipp HildebrandtFelix Naumann, Fabian PansePhD student at HPI

 

 

2024

Topic Student Advisors Last seen as/at
The Effects of Data Quality on Named Entity Recognition
Published at WNUT@EACL24 (Best paper award)
Divya Bhadauria (Universität Potsdam) Felix Naumann, Ralf Krestel, Alejandro Sierra Múnera PhD student at HPI
Mining Larger Persistent Patterns in Temporal Networks Jakob Edding (HPI) Felix Naumann, Tobias Bleifuß  

2023

Topic Student Advisors Last seen as/at
NumbER - Entity Resolution on Numerical Data Lukas Laskowski Fabian Panse, Felix Naumann PhD student at HPI
Correlation Anomaly Detection in High-Dimensional Time Series Niklas Köhnecke Schmidl/Wenig, Papenbrock, Naumann TNG Technology Consulting
Efficient Discovery of Conditional Unique Column Combinations Torben Meyer Youri Kaminsky, Felix Naumann bakdata
Semantic Column Type Detection Using Pretrained Language Models Jonathan Haas Felix Naumann, Hazar Harmouch  
Quantification of Diversity in Structured Data Lisa Koeritz Felix Naumann, Hazar Harmouch Researcher at  Internationales Zentrum für Ethik in den Wissenschaften (IZEW)
Leveraging Data Histories to Improve Entity Resolution Sebastian Apitz Felix Naumann, Leon Bornemann kontura GmbH

2022

Topic Student Advisors Last seen as/at
Estimating Population-Based Completeness in Web Tables Johanna Schwinn (Universität Potsdam) Naumann  
Der Einfluss von Datenqualitätsmängeln auf Fairness in KI-Systemen Isabel Bär Naumann  
HYPEX: Explainable Hyperparameter Optimization in Time Series Anomaly Detection
published at BTW 2023
Mats Pörschke Schmidl, Wenig, Naumann, Papenbrock Software Engineer at Apple Heidelberg
Mona L. = M. Lisa = La Gioconda? Jointly Linking Entities and Extracting Relations with BERT Lucas Silbernagel (HWR) Jain, Naumann, Schaal Analytics Engineer at Knowunity
Discovering Similarity Inclusion Dependencies
published at SIGMOD 2023
Youri Kaminsky Naumann, Pena PhD student at HPI
Discovery of Complementation Dependencies Jonas Hering Naumann, Harmouch Software Engineer at voize
Time Series Anomaly Detection: An Aircraft Turbine Case Study Jacopo Roberto Nicosia (University of Milano-Bicocca) Schmidl, Wenig, Papenbrock  
Efficient Ultrafine-Grained Typing of Named Entities
published at JCDL 2023
Jan Westphal Naumann, Krestel, Sierra Múnera ML Engineer at voize

Workload-Driven Query Optimization Using Data Dependencies
published at CIDR 2022

Daniel Lindner Felix Naumann, Hasso Plattner, Michael Perscheid, Jan Koßmann, Marcel Weisgut PhD student at HPI

2021

Topic Student Advisors Last seen as/at
Diachronic Alignment of Induced Word Senses Robert Schwanhold Krestel, Repke SAP Innovation Center
Inferring Regular Expressions from Database Columns Tobias Niedling Harmouch, Naumann  
Extracting Tables From Plain Text
Published at BTW 23
Leonardo Hübscher Naumann, Jiang IT Consultant at Netlight
Multi-Aspect Embeddings for Fiction Novels
Published at WI 21
Lasse Kohlmeyer Krestel, Repke Capgemini
Distributed Duplicate Detection on Streaming Data Jakob Köhler Papenbrock  

Continuous Learning for Hate Speech Detection

Sezi Dwi Sagarianti Prasetyo (Uni Potsdam)

Krestel

 

Generating Rap Lyrics with Flow and Rhythm Noel Danz Krestel, Repke  

2020

Topic Student Advisors Last seen as/at
Actor based Database System for Analytical Queries
Published at BTW 2021
Julian Weise Papenbrock Snowflake
Distributed Graph Based Approximate Nearest Neighbor Search Juliane Waack Papenbrock Snowflake
Efficient Distributed Discovery of Bidirectional Order Dependencies
Published in The VLDB Journal
Sebastian Schmidl Papenbrock HPI
Distributed Detection of Sequential Anomalies in Time Related Data Sequences
Published in The VLDB Journal
Johannes Schneider Papenbrock, Wenig SAP
Enriching Document Embeddings With Domain Knowledge
Published at NAACL 2021
Philipp Hager Krestel, Risch PhD student at Amsterdam University
Distributed Unique Column Combination Discovery Benjamin Feldmann Papenbrock, Thomas Bläsius, Martin Schirneck bakdata
Multi-Prototype Diachronic Word Embedding Maxi Fischer Krestel, Repke Sopra Steria
Inclusion Dependency Discovery on Streaming Data Alexander Preuss Papenbrock bakdata
Reactive Inclusion Dependency Discovery Frederic Schneider Papenbrock Netlight, Berlin

2019

Topic Student Advisors Last seen as/at

Context-aware Classification of News Comments       

Johannes Filter Krestel, Risch AlgorithmWatch
Jointly Learning Document and Label Embeddings for Hierarchically Labeled Text
Published at JCDL
Samuele Garda (Uni Potsdam) Krestel, Risch HU Berlin

Improving Stock Price prediction using News Articles and Company Networks
Published at MIDAS

Thomas Kellermeier 

Krestel, Repke

neXenio
Modeling News Commenters for Discussion Recommendation
Published at WI
Victor Künstler Krestel, Risch

bakdata

Finding Related Tables on the Web Fabian Windheuser Naumann, Harmouch Palantir
Wikipedia Table Layout Detection and Standardization Martin Zabel Bleifuß, Bornemann, Naumann Solutiance

2018

Topic Student Advisors Last seen as/at
Automatically Managing News Comments
Published at NAACL 2018
Carl Ambroselli Risch, Krestel Palantir
Efficient Discovery of Matching Dependencies
Published at TODS
Philipp Schirmer Papenbrock, Naumann bakdata
Discovering Conditional Functional Dependencies Maximilian Grundke Papenbrock, Naumann  
Efficient Detection of Genuine Approximate Functional Dependencies Moritz Finke Papenbrock, Naumann, Claudia Lehmann (SAP) Google

Text Data Generation for Data Profiling Use Cases

Jennifer Stamm Naumann, Papenbrock SAP

2017

Topic Student Advisors Last seen as/at
Uncertain Estimates in Cross-Platform Plan Optimization Jonas Kemper Kruse, Naumann Mckinsey & Company
Temporal Outlier Detection in Reverse Vending Machine Data  Mariya Perchyk Naumann, Zuo Netlight
Detection of Inappropriate Content in Online Comments Dustin Gläser Krestel Solutiance Systems GmbH
Multivalued Dependency Discovery Tim Draeger Papenbrock, Naumann Cognizant
Improving Probabilistic Topic Models using Word Embeddings
Published at JCDL 2018
Stefan Bunk Krestel Merantix
Large-Scale topic-based Analysis of 
political Discussions on Twitter
Jaqueline Pollak Lazaridou, Grütze, Naumann Seven Principles AG
Focused Crawling for Record 
Completion
Daniel Neuschäfer-Rube Koumarelas, Naumann  

2016

Topic Student Advisors Last seen as/at

Entropy-Based Topic Modeling for Multiple Domain-Specific Text Collections
Published at JCDL 2018

Julian Risch

Krestel

HPI

Efficient Denial Constraint Discovery
Published at VLDB 2018
Tobias Bleifuß Kruse, Naumann HPI
Clozing the Gap: Knowledge Base Population by Answering Cloze Queries Thomas Werkmeister Krestel, Weissenborn (DFKI)  
From text to facts: Relation extraction on German company websites Tanja Bergmann Loster, Naumann Rasa
German Organization Name Part Classification Manuel Hegner Loster, Naumann bakdata
Classification of German Newspaper Comments
Published at LWDA 2016
Christian Godde Krestel, Lazaridou  
Modeling Binding Preferences of RNA-binding Proteins with Hidden Markov Models
Published in Nucleic Acids Research
David Heller Krestel, Marsico (MPI MolGen) MPI MolGen
Context-based Tweet Recommendation for News Articles Alexander Spivak Krestel, Gruetze Capgemini, Berlin

Data Profiling Benchmark and Tool Assessment

Johannes Eschrig Naumann SAP

2015

Topic Student Advisors

Last seen as/at

Quicker Ways of Doing Fewer Things: Improved Index Structures and Algorithms for Data Profiling Jakob Zwiener Kruse, Naumann Google, Zurich
Dialog Act Recognition in Twitter Conversations
Published in parts at SIGDIAL 2015
Elina Zarisheva Stede, Naumann  
Profiling Log Messages for Unknown Error Detection Lukas Schulze Naumann, Jenders, Oelmüller (ePost)  
Efficient Order Dependency Detection
Published in VLDB Journal
Philipp Langer Naumann IBM, Böblingen
Spinning a Web of Tables through Inclusion Dependencies
Published in Transactions on Database Systems (TODS)
Fabian Tschirschnitz Papenbrock, Naumann SAP
Online Temporal Summarization of News Events
Published at WI 2015
Tobias Schubotz Krestel, Jenders dubsmash
Discovery of Conditional Unique Column Combinations Jens Ehrlich Papenbrock, Naumann IVU Traffic Technologies AG
A Topic-Based Search for Microblog Posts
Published at LWA 2015
Mandy Roick Krestel, Jenders Audentia Management Consulting GmbH
Scanpath Comparison for Visual Search Analysis Markus Hinsche Kasneci, Naumann Artory, Berlin

2014

TopicStudentAdvisors

Last seen as/at

Estimating Metadata of Query Results using HistogramsCathleen RamsonNaumann, KruseIVU Traffic Technologies AG
Using Twitter for Politician Tracking: Following our Leader's PathsJan RehwaldtNaumann, Kasneci
Optimization and Parallelization of Foodborne Disease Outbreak Analyses
Winner of the 2015 TDWI Award
Markus FreitagNaumann, Filter (BfR)eitco
A Content-Based Serendipity Model for News Recommendation
Published at KI 2015
Thorben LindhauerKasneci/Jenderscamunda services GmbH
Large-Scale Twitter Hashtag Recommendation for Documents
Published at TempWeb 2015
Gary YaoKasneci/GrützeZalando

Optimizing Performance of Linked Data Profiling (code)
Published at PROFILES 2014

Benedikt ForchhammerNaumann/JentzschIndependent software developer
Text Profiling: Aggregation Analyses on Sets of TextsMatthias KohnenNaumannSoftware Architect at SAP
Automatische Generierung eines Doktorvater-StammbaumesThomas KaskeNaumann
Depth-first Discovery of Functional Dependencies
Published at CIKM 2014 - Winner of the Best Student Paper Award
Patrick SchulzeNaumann/AdedjanConsultant for Neon Roots
Estimating the Complexity and Effort of Data Integration
Published at EDBT 2015
Sebastian KruseNaumann/Papotti (QCRI)HPI
Discovery of Strong- and Weak-Unique Column Combinations in DatasetsClaudia LehmannNaumann/HeiseSAP, Palo Alto
Discovering Matching DependenciesAndrina MascherNaumann/PapenbrockSignavio

2013

TopicStudentAdvisorsLast seen as/at

Entwicklung einer Experten-Suchmaschine

Stefan GeorgeKasneci
Iterative Data CleansingTobias RawaldNaumann, Heise
Incremental Data ProfilingSven ViehmeierNaumann, Abedjan
Progressive Duplicate Detection
Published as TKDE article
Thorsten PapenbrockNaumann, HeiseHPI, 
Wissenschaftlicher Mitarbeiter
Optimierung regelbasierter DuplikaterkennungFlorian ThomasNaumann, Draisbach
Legislatum - Gesetzessuchmaschine für LaienStephan WehrmeyerNaumann
Strategies for structure-based rewriting of SPARQL queries for data prefetchingArmin ZamaniNaumann, LoreySAP
Produktduplikaterkennung und TitelfusionRobert AschenbrennerNaumann

2012

TopicStudentAdvisorsLast seen as/at
Manuelle Duplikaterkennung mittels Crowdsourcing
Second place in German Best Master Degree Award by DGIQ
David WenzelVogel, NaumannCapgemini
Context-aware Recommendations in Social NetworksBenjamin EmdeAbedjan, NaumannGoEuro 
Generating Query Suggestions by Exploiting Latent Semantics in Query Log Fabian LindenbergMomtazi, NaumannEcotastic
Analyzing and Predicting Viral Tweets
Published at MSND'13 workshop
Maximilian Jenders

Kasneci

PhD-Student, HPI
Erweiterung und Optimierung eines Graph-Clustering-VerfahrensEyk KnyBöhm, NaumannSAP Innovation Center Potsdam
Email Classification with Contextual InformationMichael LebenMomtazi, Naumann
Summarizing Extract-Transform-Load Workflows

MinhTuan Nguyen

Naumann, AlbrechtSoftware Developer, Otto Group
Automatic Data Normalization Using Pattern-Based Repairs

Sebastian Kölle

Naumann, Heise

2011

TopicStudentAdvisorsLast seen as/at
Effiziente Ähnlichkeitssuche in einer großen Menge von Zeichenketten mittels Key-Value-Store
Published at SSDBM 2012
Dandy FenzLange, NaumannDeveloper at madvertise Mobile Advertising GmbH
Automatisierte Konfiguration des D-Indexes zur ÄhnlichkeitssucheMatthias PohlLange, Naumanncpi GmbH, Berlin
HDRS: A Scalable Peer-to-Peer RDF Storage Infrastructure for Hadoop
Published at I-Semantics 2012
Daniel Hefenbrock

Böhm, Naumann

Developer at Microsoft

Überlappendes Clustering von KonzeptenJohannes GosdaNaumann, BöhmDeveloper at Pass Consultion
Advanced Service Discovery: Beyond Full-text SearchThomas BergerAbuJarour, NaumannInubit AG, Berlin
Concept matching in the Web of Data
Published at LDOW 2012
Toni GrützeBöhm, NaumannWissenschaftlicher Mitarbeiter, HPI

2010

TopicStudentAdvisorsLast seen as/at
Duplikaterkennung unter Verwendung unstrukturierter AnteileDavid SonnabendNaumann, AbuJarour, Vogellead on GmbH
Extraction of Management Concepts from Web Sites for Sentiment AnalysisArvid HeiseNaumann, Walgenbach (Jena)Research Assistant, HPI
Wikipedia cross-lingual Concept Identification and Infobox Alignment
First place in IQ Best Master Degree Contest of dgiq
Published in Information Systems journal
Daniel RinserNaumann, Lange
Optimizing query execution to improve the energy efficiency of database management systemsTobias FlachNaumannPhD Student at UCSC
Discovering unique column combinations within a database
Published at CIKM 2011
Ziawasch AbedjanNaumannPostDoc, MIT
A flexible index structure for interactive data profilingIngmar RötzlerBöhm, Naumann
ETL process recommendationAndriy VedrychNaumann, Albrecht

2009

FormTopicStudentAdvisorsLast seen as/at
Master's thesis (extern FernUni Hagen)Partitionierung zur effizienten Duplikaterkennung in relationalen Daten
First place in IQ Best Master Degree Contest of dgiq
Published at QDB 2009
Uwe DraisbachNaumann, Schlageter (U Hagen)  Research Assistant, HPI
Master's thesisMining Webservices for MetadataMartin ProbstKaufer, NaumannSenior Software Engineer, EMC
Master's thesisConception and Development of a user-validated repository for user research informationAlexander RennebergNaumannConsultant, d-labs
Diplomarbeit (extern HU Berlin)Nutzung von Statistiken über Daten-Overlap zur Anfrageoptimierung in Peer Data Management SystemenVeronique TietzRoth, Naumann
Master's thesisEntwicklung von Sonderprofilen für die effektive Duplikaterkennung in der SCHUFA Personendatenbank Christin KoitschkaBleiholder, Weis, NaumannDeutsche Bank
Diplomarbeit (extern HU Berlin)Merging Extract, Transform, Load ProcessesKarsten DrabaAlbrecht, Leser, Naumann
Master'sSenior Software Developer at engageSPARK

2008

FormTopicStudentAdvisorsLast seen as/at
Master's thesisParallelisierung von Graphduplikaterkennung
Published in TKDE 2011
Maik TaubertWeis, Szott, NaumannBiotronik GmbH

2007

FormTopicStudentAdvisorsLast seen as/at
Diplomarbeit (extern, HU Berlin)Automatisiertes Auffinden von Präfix- und
Suffix-Inklusionsabhängigkeiten in relationalen
Datenbankmanagementsystemen
First place in IQ Best Master Degree Contest of dgiq
Jan HegewaldLeser, Naumann, BauckmannTeamleiter bei idealo internet GmbH
Diplomarbeit (extern, Uni Halle)XML Schema Matching unter Verwendung der Tree Edit DistanceSascha Szott Naumann, Brass (U Halle), Weis, Kaufer Zuse Institut (ZIB), Berlin

2005 – 2006 at Humboldt Universität

FormTopicStudentAdvisors   Last seen as/at
Diplomarbeit (extern HU Berlin)Entwurf eines Peer Data Management Systems mit Steuerungs- und SimulationskomponenteMartin SchweigertNaumann  IVU Traffic
  Tech. AG
Diplomarbeit (extern HU Berlin)Entwicklung einer
Testumgebung für ein
Peer Data Management
System
Tobias
Hübner
Naumann  ID-Berlin
  GmbH
Diplomarbeit (extern HU Berlin)Tree-Edit-Distance
based Schema Matching
Evgenia
Ershova
Naumann
Diplomarbeit (extern HU Berlin)Schemaintegration auf der  Grundlage von Schema-MappingsJana BauckmannNaumann

  WIdO

Diplomarbeit (extern HU Berlin)Entwicklung von Klassifikation von Schema Mappings und deren Anwendung in
einer Portallösung
Jens HarzerNaumann  Wessendorf 
  Software &
  Consulting
  GmbH
Diplomarbeit (extern HU Berlin)Duplikaterkennung in XML Daten mit der Sorted Neighborhood Methode
Published at EDBT 2006
Sven PuhlmannrNaumann  Center for 
  Language  
  Technology -
  Macquarie
  University,
  Australia
Diplomarbeit (extern HU Berlin)Kombination von Schema Matching Verfahren zur semi-automatischen Integration von Fahrzeug-DatenLenka IvantysynovaNaumann  
  Graduierten-
  kolleg 
  "Verteilte
 Informationssys."
Diplomarbeit (extern HU Berlin)Datentransformation mittels Schema Mapping
Published at BTW 2007
Frank LeglerNaumann

  IBM Forschung 
  & Entwicklung,
  Böblingen