Prof. Dr. Felix Naumann

Completed master's theses

Listed here are student theses since winter semester 2005/2006. All master's theses are available upon request as pdf files. Please contact office-naumann.


TopicStudentAdvisorsLast seen as/at
The Effects of Data Quality on Named Entity Recognition
Published at WNUT@EACL24
Divya Bhadauria (Universität Potsdam)Felix Naumann, Ralf Krestel, Alejandro Sierra Múnera 


TopicStudentAdvisorsLast seen as/at
NumbER - Entity Resolution on Numerical DataLukas LaskowskiFabian Panse, Felix NaumannPhD student at HPI
Correlation Anomaly Detection in High-Dimensional Time SeriesNiklas KöhneckeSchmidl/Wenig, Papenbrock, NaumannTNG Technology Consulting
Efficient Discovery of Conditional Unique Column CombinationsTorben MeyerYouri Kaminsky, Felix Naumannbakdata
Semantic Column Type Detection Using Pretrained Language ModelsJonathan HaasFelix Naumann, Hazar Harmouch 
Quantification of Diversity in Structured DataLisa KoeritzFelix Naumann, Hazar HarmouchResearcher at  Internationales Zentrum für Ethik in den Wissenschaften (IZEW)
Leveraging Data Histories to Improve Entity ResolutionSebastian ApitzFelix Naumann, Leon Bornemannkontura GmbH


TopicStudentAdvisorsLast seen as/at
Estimating Population-Based Completeness in Web TablesJohanna Schwinn (Universität Potsdam)Naumann 
Der Einfluss von Datenqualitätsmängeln auf Fairness in KI-SystemenIsabel BärNaumann 
HYPEX: Explainable Hyperparameter Optimization in Time Series Anomaly Detection
published at BTW 2023
Mats PörschkeSchmidl, Wenig, Naumann, PapenbrockSoftware Engineer at Apple Heidelberg
Mona L. = M. Lisa = La Gioconda? Jointly Linking Entities and Extracting Relations with BERTLucas Silbernagel (HWR)Jain, Naumann, SchaalAnalytics Engineer at Knowunity
Discovering Similarity Inclusion Dependencies
published at SIGMOD 2023
Youri KaminskyNaumann, PenaPhD student at HPI
Discovery of Complementation DependenciesJonas HeringNaumann, HarmouchSoftware Engineer at voize
Time Series Anomaly Detection: An Aircraft Turbine Case StudyJacopo Roberto Nicosia (University of Milano-Bicocca)Schmidl, Wenig, Papenbrock 
Efficient Ultrafine-Grained Typing of Named Entities
published at JCDL 2023
Jan WestphalNaumann, Krestel, Sierra MúneraML Engineer at voize


TopicStudentAdvisorsLast seen as/at
Diachronic Alignment of Induced Word SensesRobert SchwanholdKrestel, RepkeSAP Innovation Center
Inferring Regular Expressions from Database ColumnsTobias NiedlingHarmouch, Naumann 
Extracting Tables From Plain Text
Published at BTW 23
Leonardo HübscherNaumann, JiangIT Consultant at Netlight
Multi-Aspect Embeddings for Fiction Novels
Published at WI 21
Lasse KohlmeyerKrestel, RepkeCapgemini
Distributed Duplicate Detection on Streaming DataJakob KöhlerPapenbrock 

Continuous Learning for Hate Speech Detection

Sezi Dwi Sagarianti Prasetyo (Uni Potsdam)



Generating Rap Lyrics with Flow and RhythmNoel DanzKrestel, Repke 


TopicStudentAdvisorsLast seen as/at
Actor based Database System for Analytical Queries
Published at BTW 2021
Julian WeisePapenbrockSnowflake
Distributed Graph Based Approximate Nearest Neighbor SearchJuliane WaackPapenbrockSnowflake
Efficient Distributed Discovery of Bidirectional Order Dependencies
Published in The VLDB Journal
Sebastian SchmidlPapenbrockHPI
Distributed Detection of Sequential Anomalies in Time Related Data Sequences
Published in The VLDB Journal
Johannes SchneiderPapenbrock, WenigSAP
Enriching Document Embeddings With Domain Knowledge
Published at NAACL 2021
Philipp HagerKrestel, RischPhD student at Amsterdam University
Distributed Unique Column Combination DiscoveryBenjamin FeldmannPapenbrock, Thomas Bläsius, Martin Schirneckbakdata
Multi-Prototype Diachronic Word EmbeddingMaxi FischerKrestel, RepkeSopra Steria
Inclusion Dependency Discovery on Streaming DataAlexander PreussPapenbrockbakdata
Reactive Inclusion Dependency DiscoveryFrederic SchneiderPapenbrockNetlight, Berlin


TopicStudentAdvisorsLast seen as/at

Context-aware Classification of News Comments       

Johannes FilterKrestel, RischAlgorithmWatch
Jointly Learning Document and Label Embeddings for Hierarchically Labeled Text
Published at JCDL
Samuele Garda (Uni Potsdam)Krestel, RischHU Berlin

Improving Stock Price prediction using News Articles and Company Networks
Published at MIDAS

Thomas Kellermeier 

Krestel, Repke

Modeling News Commenters for Discussion Recommendation
Published at WI
Victor KünstlerKrestel, Risch


Finding Related Tables on the WebFabian WindheuserNaumann, HarmouchPalantir
Wikipedia Table Layout Detection and StandardizationMartin ZabelBleifuß, Bornemann, NaumannSolutiance


TopicStudentAdvisorsLast seen as/at
Automatically Managing News Comments
Published at NAACL 2018
Carl AmbroselliRisch, KrestelPalantir
Efficient Discovery of Matching Dependencies
Published at TODS
Philipp SchirmerPapenbrock, Naumannbakdata
Discovering Conditional Functional DependenciesMaximilian GrundkePapenbrock, Naumann 
Efficient Detection of Genuine Approximate Functional DependenciesMoritz FinkePapenbrock, Naumann, Claudia Lehmann (SAP)Google

Text Data Generation for Data Profiling Use Cases

Jennifer StammNaumann, PapenbrockSAP


TopicStudentAdvisorsLast seen as/at
Uncertain Estimates in Cross-Platform Plan OptimizationJonas KemperKruse, NaumannMckinsey & Company
Temporal Outlier Detection in Reverse Vending Machine Data Mariya PerchykNaumann, ZuoNetlight
Detection of Inappropriate Content in Online CommentsDustin GläserKrestelSolutiance Systems GmbH
Multivalued Dependency DiscoveryTim DraegerPapenbrock, NaumannCognizant
Improving Probabilistic Topic Models using Word Embeddings
Published at JCDL 2018
Stefan BunkKrestelMerantix
Large-Scale topic-based Analysis of 
political Discussions on Twitter
Jaqueline PollakLazaridou, Grütze, NaumannSeven Principles AG
Focused Crawling for Record 
Daniel Neuschäfer-RubeKoumarelas, Naumann 


TopicStudentAdvisorsLast seen as/at

Entropy-Based Topic Modeling for Multiple Domain-Specific Text Collections
Published at JCDL 2018

Julian Risch



Efficient Denial Constraint Discovery
Published at VLDB 2018
Tobias BleifußKruse, NaumannHPI
Clozing the Gap: Knowledge Base Population by Answering Cloze QueriesThomas WerkmeisterKrestel, Weissenborn (DFKI) 
From text to facts: Relation extraction on German company websitesTanja BergmannLoster, NaumannRasa
German Organization Name Part ClassificationManuel HegnerLoster, Naumannbakdata
Classification of German Newspaper Comments
Published at LWDA 2016
Christian GoddeKrestel, Lazaridou 
Modeling Binding Preferences of RNA-binding Proteins with Hidden Markov Models
Published in Nucleic Acids Research
David HellerKrestel, Marsico (MPI MolGen)MPI MolGen
Context-based Tweet Recommendation for News ArticlesAlexander SpivakKrestel, GruetzeCapgemini, Berlin

Data Profiling Benchmark and Tool Assessment

Johannes EschrigNaumannSAP



Last seen as/at

Quicker Ways of Doing Fewer Things: Improved Index Structures and Algorithms for Data ProfilingJakob ZwienerKruse, NaumannGoogle, Zurich
Dialog Act Recognition in Twitter Conversations
Published in parts at SIGDIAL 2015
Elina ZarishevaStede, Naumann 
Profiling Log Messages for Unknown Error DetectionLukas SchulzeNaumann, Jenders, Oelmüller (ePost) 
Efficient Order Dependency Detection
Published in VLDB Journal
Philipp LangerNaumannIBM, Böblingen
Spinning a Web of Tables through Inclusion Dependencies
Published in Transactions on Database Systems (TODS)
Fabian TschirschnitzPapenbrock, NaumannSAP
Online Temporal Summarization of News Events
Published at WI 2015
Tobias SchubotzKrestel, Jendersdubsmash
Discovery of Conditional Unique Column CombinationsJens EhrlichPapenbrock, NaumannIVU Traffic Technologies AG
A Topic-Based Search for Microblog Posts
Published at LWA 2015
Mandy RoickKrestel, JendersAudentia Management Consulting GmbH
Scanpath Comparison for Visual Search AnalysisMarkus HinscheKasneci, NaumannArtory, Berlin



Last seen as/at

Estimating Metadata of Query Results using HistogramsCathleen RamsonNaumann, KruseIVU Traffic Technologies AG
Using Twitter for Politician Tracking: Following our Leader's PathsJan RehwaldtNaumann, Kasneci
Optimization and Parallelization of Foodborne Disease Outbreak Analyses
Winner of the 2015 TDWI Award
Markus FreitagNaumann, Filter (BfR)eitco
A Content-Based Serendipity Model for News Recommendation
Published at KI 2015
Thorben LindhauerKasneci/Jenderscamunda services GmbH
Large-Scale Twitter Hashtag Recommendation for Documents
Published at TempWeb 2015
Gary YaoKasneci/GrützeZalando

Optimizing Performance of Linked Data Profiling (code)
Published at PROFILES 2014

Benedikt ForchhammerNaumann/JentzschIndependent software developer
Text Profiling: Aggregation Analyses on Sets of TextsMatthias KohnenNaumannSoftware Architect at SAP
Automatische Generierung eines Doktorvater-StammbaumesThomas KaskeNaumann
Depth-first Discovery of Functional Dependencies
Published at CIKM 2014 - Winner of the Best Student Paper Award
Patrick SchulzeNaumann/AdedjanConsultant for Neon Roots
Estimating the Complexity and Effort of Data Integration
Published at EDBT 2015
Sebastian KruseNaumann/Papotti (QCRI)HPI
Discovery of Strong- and Weak-Unique Column Combinations in DatasetsClaudia LehmannNaumann/HeiseSAP, Palo Alto
Discovering Matching DependenciesAndrina MascherNaumann/PapenbrockSignavio


TopicStudentAdvisorsLast seen as/at

Entwicklung einer Experten-Suchmaschine

Stefan GeorgeKasneci
Iterative Data CleansingTobias RawaldNaumann, Heise
Incremental Data ProfilingSven ViehmeierNaumann, Abedjan
Progressive Duplicate Detection
Published as TKDE article
Thorsten PapenbrockNaumann, HeiseHPI, 
Wissenschaftlicher Mitarbeiter
Optimierung regelbasierter DuplikaterkennungFlorian ThomasNaumann, Draisbach
Legislatum - Gesetzessuchmaschine für LaienStephan WehrmeyerNaumann
Strategies for structure-based rewriting of SPARQL queries for data prefetchingArmin ZamaniNaumann, LoreySAP
Produktduplikaterkennung und TitelfusionRobert AschenbrennerNaumann


TopicStudentAdvisorsLast seen as/at
Manuelle Duplikaterkennung mittels Crowdsourcing
Second place in German Best Master Degree Award by DGIQ
David WenzelVogel, NaumannCapgemini
Context-aware Recommendations in Social NetworksBenjamin EmdeAbedjan, NaumannGoEuro 
Generating Query Suggestions by Exploiting Latent Semantics in Query Log Fabian LindenbergMomtazi, NaumannEcotastic
Analyzing and Predicting Viral Tweets
Published at MSND'13 workshop
Maximilian Jenders


PhD-Student, HPI
Erweiterung und Optimierung eines Graph-Clustering-VerfahrensEyk KnyBöhm, NaumannSAP Innovation Center Potsdam
Email Classification with Contextual InformationMichael LebenMomtazi, Naumann
Summarizing Extract-Transform-Load Workflows

MinhTuan Nguyen

Naumann, AlbrechtSoftware Developer, Otto Group
Automatic Data Normalization Using Pattern-Based Repairs

Sebastian Kölle

Naumann, Heise


TopicStudentAdvisorsLast seen as/at
Effiziente Ähnlichkeitssuche in einer großen Menge von Zeichenketten mittels Key-Value-Store
Published at SSDBM 2012
Dandy FenzLange, NaumannDeveloper at madvertise Mobile Advertising GmbH
Automatisierte Konfiguration des D-Indexes zur ÄhnlichkeitssucheMatthias PohlLange, Naumanncpi GmbH, Berlin
HDRS: A Scalable Peer-to-Peer RDF Storage Infrastructure for Hadoop
Published at I-Semantics 2012
Daniel Hefenbrock

Böhm, Naumann

Developer at Microsoft

Überlappendes Clustering von KonzeptenJohannes GosdaNaumann, BöhmDeveloper at Pass Consultion
Advanced Service Discovery: Beyond Full-text SearchThomas BergerAbuJarour, NaumannInubit AG, Berlin
Concept matching in the Web of Data
Published at LDOW 2012
Toni GrützeBöhm, NaumannWissenschaftlicher Mitarbeiter, HPI


TopicStudentAdvisorsLast seen as/at
Duplikaterkennung unter Verwendung unstrukturierter AnteileDavid SonnabendNaumann, AbuJarour, Vogellead on GmbH
Extraction of Management Concepts from Web Sites for Sentiment AnalysisArvid HeiseNaumann, Walgenbach (Jena)Research Assistant, HPI
Wikipedia cross-lingual Concept Identification and Infobox Alignment
First place in IQ Best Master Degree Contest of dgiq
Published in Information Systems journal
Daniel RinserNaumann, Lange
Optimizing query execution to improve the energy efficiency of database management systemsTobias FlachNaumannPhD Student at UCSC
Discovering unique column combinations within a database
Published at CIKM 2011
Ziawasch AbedjanNaumannPostDoc, MIT
A flexible index structure for interactive data profilingIngmar RötzlerBöhm, Naumann
ETL process recommendationAndriy VedrychNaumann, Albrecht


FormTopicStudentAdvisorsLast seen as/at
Master's thesis (extern FernUni Hagen)Partitionierung zur effizienten Duplikaterkennung in relationalen Daten
First place in IQ Best Master Degree Contest of dgiq
Published at QDB 2009
Uwe DraisbachNaumann, Schlageter (U Hagen)  Research Assistant, HPI
Master's thesisMining Webservices for MetadataMartin ProbstKaufer, NaumannSenior Software Engineer, EMC
Master's thesisConception and Development of a user-validated repository for user research informationAlexander RennebergNaumannConsultant, d-labs
Diplomarbeit (extern HU Berlin)Nutzung von Statistiken über Daten-Overlap zur Anfrageoptimierung in Peer Data Management SystemenVeronique TietzRoth, Naumann
Master's thesisEntwicklung von Sonderprofilen für die effektive Duplikaterkennung in der SCHUFA Personendatenbank Christin KoitschkaBleiholder, Weis, NaumannDeutsche Bank
Diplomarbeit (extern HU Berlin)Merging Extract, Transform, Load ProcessesKarsten DrabaAlbrecht, Leser, Naumann
Master's thesisErstellung eines validierten Konzepts für eine Web-basierte Anwendung zum Einholen von frühzeitigem Endnutzer-FeedbackJörn HartwigNaumannCEO, d-labs
Master's thesisAutomatic Identification and Collection of Web ServicesMircea CraculeacAbuJarour, Naumann

Junior Software Engineer, neofonie

Master's thesisGenerierung von Web Services zur Kapselung mehrstufiger Webformulare
Published at ICSOC 2009
Tobias VogelKaufer, Naumannidealo, Berlin
Master's thesisAdaptive Fenstergröße bei der Sorted Neighborhood Methode
Published at ICDE 2012
Oliver WonnebergSzott, NaumannIT at Butter Lindner, Berlin
Master's thesisLearning to Extract Structured Information from Wikipedia Articles to Populate Infoboxes
Published at CIKM 2010
Dustin LangeNaumann, BöhmResearch Assistant, HPI
Master's thesisEfficient Domain-Independent Planning using Declarative ProgrammingMurat KnechtNaumann, Schaub (IfI)Senior Software Developer at engageSPARK


FormTopicStudentAdvisorsLast seen as/at
Master's thesisParallelisierung von Graphduplikaterkennung
Published in TKDE 2011
Maik TaubertWeis, Szott, NaumannBiotronik GmbH


FormTopicStudentAdvisorsLast seen as/at
Diplomarbeit (extern, HU Berlin)Automatisiertes Auffinden von Präfix- und
Suffix-Inklusionsabhängigkeiten in relationalen
First place in IQ Best Master Degree Contest of dgiq
Jan HegewaldLeser, Naumann, BauckmannTeamleiter bei idealo internet GmbH
Diplomarbeit (extern, Uni Halle)XML Schema Matching unter Verwendung der Tree Edit DistanceSascha Szott Naumann, Brass (U Halle), Weis, Kaufer Zuse Institut (ZIB), Berlin

2005 – 2006 at Humboldt Universität

FormTopicStudentAdvisors   Last seen as/at
Diplomarbeit (extern HU Berlin)Entwurf eines Peer Data Management Systems mit Steuerungs- und SimulationskomponenteMartin SchweigertNaumann  IVU Traffic
  Tech. AG
Diplomarbeit (extern HU Berlin)Entwicklung einer
Testumgebung für ein
Peer Data Management
Naumann  ID-Berlin
Diplomarbeit (extern HU Berlin)Tree-Edit-Distance
based Schema Matching
Diplomarbeit (extern HU Berlin)Schemaintegration auf der  Grundlage von Schema-MappingsJana BauckmannNaumann


Diplomarbeit (extern HU Berlin)Entwicklung von Klassifikation von Schema Mappings und deren Anwendung in
einer Portallösung
Jens HarzerNaumann  Wessendorf 
  Software &
Diplomarbeit (extern HU Berlin)Duplikaterkennung in XML Daten mit der Sorted Neighborhood Methode
Published at EDBT 2006
Sven PuhlmannrNaumann  Center for 
  Technology -
Diplomarbeit (extern HU Berlin)Kombination von Schema Matching Verfahren zur semi-automatischen Integration von Fahrzeug-DatenLenka IvantysynovaNaumann  
Diplomarbeit (extern HU Berlin)Datentransformation mittels Schema Mapping
Published at BTW 2007
Frank LeglerNaumann

  IBM Forschung 
  & Entwicklung,