Prof. Dr. Felix Naumann

Please also see our (selected) presentations and dissertations.



  • PRISMA: A Privacy-Preserving Schema Matcher using Functional Dependencies
    Jan-Eric Hellenberg, Fabian Mahling, Lukas Laskowski, Felix Naumann, Matteo Paganelli, Fabian Panse
    Proceedings of the 28th International Conference on Extending Database Technology (EDBT), 2025 (to appear)




  • Shact: Disentangling and Clustering Latent Syntactic Structures from Transformer Encoders
    Alejandro Sierra-Múnera, Ralf Krestel
    Proceedings of the 29th International Conference on Natural Language & Information Systems (NLDB), 2024
    [Paper]  [GitHub]  [DOI:10.1007/978-3-031-70239-6_25]
  • An Introduction to Machine Learning from Time Series
    Anthony Bagnall, Matthew Middlehurst, Germain Forestier, Ali Ismail-Fawaz, Antoine Guillaume, David Guijo-Rubio, Arik Ermshaus, Patrick Schäfer, Thorsten Papenbrock, Phillip Wenig, Sebastian Schmidl
    Proceedings of the European Conference on Machine Learning and Data Mining (ECML PKDD), 2024 (to appear)
  • AutoTSAD: Unsupervised Holistic Anomaly Detection for Time Series Data
    Sebastian Schmidl, Naumann Felix, Papenbrock Thorsten
    PVLDB 17:(11), 2024
    [Paper]  [vldb]  [Project Page]  [DOI:10.14778/3681954.3681978]
  • Anomaly Detectors for Multivariate Time Series: The Proof of the Pudding is in the Eating
    Phillip Wenig, Sebastian Schmidl, Thorsten Papenbrock
    Proceedings of the International Conference on Data Engineering Workshops (ICDEW), 2024
    [Paper]  [DOI:10.1109/ICDEW61823.2024.00018]
  • The Effects of Data Quality on Named Entity Recognition
    Divya Bhadauria, Alejandro Sierra-Múnera, Ralf Krestel
    Proceedings of the Ninth Workshop on Noisy and User-generated Text (W-NUT 2024), 2024
    [Paper]  [GitHub] 
  • Determining the Largest Overlap between Tables
    Luca Zecchini, Tobias Bleifuß, Giovanni Simonini, Sonia Bergamaschi, Felix Naumann
    Proceedings of the ACM on Management of Data (PACMMOD) (2024)
  • Discovering Functional Dependencies through Hitting Set Enumeration
    Tobias Bleifuß, Thorsten Papenbrock, Thomas Bläsius, Martin Schirneck, Felix Naumann
    Proceedings of the ACM on Management of Data (PACMMOD) (2024)
  • TASHEEH: Repairing Row-Structure in Raw CSV Files
    Mazhar Hameed, Gerardo Vitagliano, Fabian Panse, Felix Naumann
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2024
    [Paper]  [DOI:10.48786/edbt.2024.37]
  • Efficient Discovery of Temporal Inclusion Dependencies in Wikipedia Tables
    Leon Bornemann, Tobias Bleifuß, Dmitri V. Kalashnikov, Fatemeh Nargesian, Felix Naumann, Divesh Srivastava
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2024
    [Paper]  [DOI:10.48786/edbt.2024.35]
  • Discovering Denial Constraints in Dynamic Datasets
    Eduardo Pena, Fabio Porto, Felix Naumann
    Proceedings of the International Conference on Data Engineering (ICDE), 2024
    [Paper]  [IEEE] 




  • MORPHER: Structural Transformation of ill-formed Rows
    Mazhar Hameed, Gerardo Vitagliano, Felix Naumann
    Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2023
  • Efficient Ultrafine Typing of Named Entities
    Alejandro Sierra-Múnera, Jan Westphal, Ralf Krestel
    Proceedings of the Joint Conference on Digital Libraries (JCDL), 2023
    [Paper]  [DOI:10.1109/JCDL57899.2023.00038]
  • Pollock: A Data Loading Benchmark
    Gerardo Vitagliano, Mazhar Hameed, Lan Jiang, Lucas Reisener, Eugene Wu, Felix Naumann
    PVLDB 16:(8), 2023
  • BCNF* - From Normalized- to Star-Schemas and Back Again (demo)
    Marie Fischer, Paul Roessler, Paul Sieben, Janina Adamcic, Christoph Kirchherr, Tobias Sträubig, Youri Kaminsky, Felix Naumann
    Proceedings of Companion of the 2023 International Conference on Management of Data (SIGMOD-Companion), 2023
    [Paper]  [Project Page]  [DOI:10.1145/3555041.3589712]
  • Detecting Stale Data in Wikipedia Infoboxes
    Malte Barth, Tibor Bleidt, Martin Büßemeyer, Fabian Heseding, Niklas Köhnecke, Tobias Bleifuß, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2023
    [Paper]  [Project Page] 
  • DPQL: The Data Profiling Query Language
    Marcian Seeger, Sebastian Schmidl, Alexander Vielhauer, Thorsten Papenbrock
    Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2023
    [Paper]  [DOI:10.18420/BTW2023-19]
  • HYPEX: Hyperparameter Optimization in Time Series Anomaly Detection
    Sebastian Schmidl, Phillip Wenig, Thorsten Papenbrock
    Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2023
    [Paper]  [Project Page]  [DOI:10.18420/BTW2023-22]
  • ExtracTable: Extracting Tables from Raw Data Files
    Leonardo Hübscher, Lan Jiang, Felix Naumann
    Proceedings of the Conference on Database Systems for Business, Technology, and Web (BTW), 2023
    [Paper]  [Project Page]  [DOI:10.18420/BTW2023-20]
  • Discovering Similarity Inclusion Dependencies
    Youri Kaminsky, Eduardo Pena, Felix Naumann
    Proceedings of the ACM on Management of Data (PACMMOD) (2023)
    [Paper]  [Project Page]  [DOI:10.1145/3588929]
  • Matching Roles from Temporal Data - Why Joe Biden is not only President, but also Commander-in-Chief
    Leon Bornemann, Tobias Bleifuß, Dmitri V. Kalashnikov, Fatemeh Nargesian, Felix Naumann, Divesh Srivastava
    Proceedings of the ACM on Management of Data (PACMMOD) (2023)
  • Fast Algorithms for Denial Constraint Discovery
    Eduardo Pena, Fabio Porto, Felix Naumann
    PVLDB 16:(4), 2023




  • The Effects of Data Quality on Machine Learning Performance
    Lukas Budach, Moritz Feuerpfeil, Nina Ihde, Andrea Nathansen, Nele Noack, Hendrik Patzlaff, Felix Naumann, Hazar Harmouch
    arXiv (2022)
  • Discovering Fine-Grained Semantics in Knowledge Graph Relations
    Nitisha Jain, Ralf Krestel
    Proceedings of the Thirty-First ACM International Conference on Information and Knowledge Management (CIKM), 2022
  • Structural embedding of data files with MaGRiTTE
    Gerardo Vitagliano, Mazhar Hameed, Felix Naumann
    Table Representation Learning Workshop at NeurIPS (TRL@NIPS), 2022
    [paper]  [project] 
  • Art Creation with Multi-Conditional StyleGANs
    Konstantin Dobler, Florian Hübscher, Jan Westphal, Alejandro Sierra-Múnera, Gerard de Melo, Ralf Krestel
    Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI), 2022
    [IJCAI]  [Extended arXiv Version]  [DOI:10.24963/ijcai.2022/684]
  • Generation of Training Data for Named Entity Recognition of Artworks
    Nitisha Jain, Alejandro Sierra-Múnera, Jan Ehmueller, Ralf Krestel
    Semantic Web Journal (Special Issue Cultural Heritage 2021) (2022)
  • Mondrian: Spreadsheet Layout Detection
    Gerardo Vitagliano, Lucas Reisener, Lan Jiang, Mazhar Hameed, Felix Naumann
    Proceedings of the International Conference on Management of Data (SIGMOD) (demo), 2022
    [Paper]  [ACM]  [DOI:10.1145/3514221.3520152]
  • TimeEval: A Benchmarking Toolkit for Time Series Anomaly Detection Algorithms
    Phillip Wenig, Sebastian Schmidl, Thorsten Papenbrock
    PVLDB 12:(15), 2022
    [Paper]  [Project Page]  [DOI:10.14778/3554821.3554873]
  • Frost: A Platform for Benchmarking and Exploring Data Matching Results (industry paper)
    Martin Graf, Lukas Laskowski, Florian Papsdorf, Florian Sold, Roland Gremmelspacher, Felix Naumann, Fabian Panse
    PVLDB 15:(12), 2022
    [Paper]  [Project Page] 
  • Anomaly Detection in Time Series: A Comprehensive Evaluation
    Sebastian Schmidl, Phillip Wenig, Thorsten Papenbrock
    PVLDB 9:(15), 2022
    [Paper]  [Poster]  [Project Page]  [DOI:10.14778/3538598.3538602]
  • Data Errors: Symptoms, Causes and Origins
    Ihab Ilyas, Felix Naumann
    Data Engineering Bulletin 45:(1), 2022
  • Relation Canonicalization in Open Knowledge Graphs: A Quantitative Analysis
    Maria Lomaeva, Nitisha Jain
    Proceedings of the the Extended Semantic Web Conference, Posters and Demos (ESWC), 2022
  • Generating Domain-Specific Knowledge Graphs: Challenges with Open Information Extraction
    Nitisha Jain, Alejandro Sierra-Múnera, Philipp Schmidt, Julius Streit, Simon Thormeyer, Maria Lomaeva, Ralf Krestel
    Proceedings of the International Workshop on Knowledge Graph Generation from Text at ESWC, 2022
  • AI Compliance - Challenges of Bridging Data Science and Law
    Philipp Hacker, Felix Naumann, Tobias Friedrich, Stefan Grundmann, Anja Lehmann, Herbert Zech
    Journal of Data and Information Quality (JDIQ) (2022)
    [DOI (open access)] 
  • SURAGH: Syntactic Pattern Matching to Identify Ill-Formed Records
    Mazhar Hameed, Gerardo Vitagliano, Lan Jiang, Felix Naumann
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2022
  • Mining Change Rules
    Daniel Lindner, Franziska Schumann, Nicolas Alder, Tobias Bleifuß, Leon Bornemann, Felix Naumann
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2022
  • DataGossip: A Data Exchange Extension for Distributed Machine Learning Algorithms
    Phillip Wenig, Thorsten Papenbrock
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2022
    [Paper]  [GitHub]  [DOI:10.48786/edbt.2022.24]
  • Aggregation Detection in CSV Files
    Lan Jiang, Gerardo Vitagliano, Mazhar Hameed, Felix Naumann
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2022
  • Detecting Layout Templates in Complex Multiregion Files
    Gerardo Vitagliano, Lan Jiang, Felix Naumann
    PVLDB 15:(3), 2022
    [Paper]  [ACM]  [DOI:10.14778/3494124.3494145]
  • Entity Resolution On-Demand
    Giovanni Simonini, Luca Zecchini, Sonia Bergamaschi, Felix Naumann
    PVLDB 15:(7), 2022
  • Fast Detection of Denial Constraint Violations
    Eduardo H. M. Pena, Eduardo C. de Almeida, Felix Naumann
    PVLDB 15:(4), 2022
    [VLDB]  [DOI:10.14778/3503585.3503595]
  • Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization
    Jan Kossmann, Felix Naumann, Daniel Lindner, Papenbrock Thorsten
    Proceedings of the International Conference on Innovative Database Research (CIDR), 2022
  • Efficient Distributed Discovery of Bidirectional Order Dependencies
    Sebastian Schmidl, Thorsten Papenbrock
    The VLDB Journal (2022)
    [Paper]  [Poster]  [Project Page]  [DOI:10.1007/s00778-021-00683-4]
  • Data dependencies for query optimization: a survey
    Jan Kossmann, Thorsten Papenbrock, Felix Naumann
    The VLDB Journal (2022)
    [Paper]  [pdf]  [doi] 




  • How Inclusive are We? An Analysis of Gender Diversity in Database Venues
    Angela Bonifati, Michael J. Mior, Felix Naumann, Noack Nele Sina
    SIGMOD Record 50:(4), 2021
    [Paper]  [ACM] 
  • VLDB 2021: Designing a Hybrid Conference
    Philippe Bonnet, Xin Luna Dong, Felix Naumann, Tözün Pinar
    SIGMOD Record 50:(4), 2021
    [Paper]  [ACM] 
  • Did You Enjoy the Last Supper? An Experimental Study on Cross-Domain NER Models for the Art Domain
    Alejandro Sierra-Múnera, Ralf Krestel
    Proceedings of the Workshop on Natural Language Processing for Digital Humanities (NLP4DH@ICON), 2021
    [Paper]  [GitHub] 
  • Novel Views on Novels: Embedding Multiple Facets of Long Texts
    Lasse Kohlmeyer, Tim Repke, Ralf Krestel
    Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2021
    [Paper]  [GitHub Code]  [GitHub Thesis]  [DOI:10.1145/3486622.3494006]
  • Interactive Curation of Semantic Representations in Digital Libraries
    Tim Repke, Ralf Krestel
    Proceedings of the International Conference on Asia-Pacific Digital Libraries (ICADL), 2021
  • The Secret Life of Wikipedia Tables
    Tobias Bleifuß, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava
    Proceedings of the Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores (SEA-Data@VLDB), 2021
    [Paper]  [CEUR-WS]  [Project] 
  • Improving Knowledge Graph Embeddings with Ontological Reasoning
    Nitisha Jain, Trung-Kien Tran, Mohamed H. Gad-Elrab, Daria Stepanova
    Proceedings of the International Semantic Web Conference (ISWC), 2021
  • PatentMatch: A Dataset for Matching Patent Claims & Prior Art
    Julian Risch, Nicolas Alder, Christoph Hewel, Ralf Krestel
    Proceedings of the Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech@SIGIR), 2021
    [Paper]  [Project Page] 
  • Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in One Unified Format
    Julian Risch, Philipp Schmidt, Ralf Krestel
    Proceedings of the Workshop on Online Abuse and Harms (WOAH@ACL), 2021
    [Paper]  [GitHub] 
  • Extraction and Representation of Financial Entities from Text
    Tim Repke, Ralf Krestel
    Data Science for Economics and Finance. Springer, 2021
    [Chapter]  [Springer]  [DOI:10.1007/978-3-030-66891-4_11]
  • CrashNet: an encoder–decoder architecture to predict crash test outcomes
    Mohamed Karim Belaid, Maximilian Rabus, Ralf Krestel
    Data Mining and Knowledge Discovery (2021)
    [Springer]  [DOI:10.1007/s10618-021-00761-9]
  • Modeling the Evolution of Word Senses with Force-Directed Layouts of Co-occurrence Networks
    Robert Schwanhold, Tim Repke, Ralf Krestel
    Proceedings of the International Workshop on Computational Approaches to Historical Language Change (LChange@ACL), 2021
    [Paper]  [Project]  [DOI:10.18653/v1/2021.lchange-1.8]
  • Evaluation of Duplicate Detection Algorithms: From Quality Measures to Test Data Generation (tutorial)
    Fabian Panse, Felix Naumann
    Proceedings of the International Conference on Data Engineering (ICDE), 2021
  • Distributed detection of sequential anomalies in univariate time series
    Johannes Schneider, Phillip Wenig, Thorsten Papenbrock
    The VLDB Journal (2021)
    [Paper]  [Poster]  [Project Page]  [DOI:10.1007/s00778-021-00657-6]
  • Multifaceted Domain-Specific Document Embeddings
    Julian Risch, Philipp Hager, Ralf Krestel
    Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)(NAACL), 2021
    [Paper]  [Project Page] 
  • Optimized Theta-Join Processing
    Julian Weise, Sebastian Schmidl, Thorsten Papenbrock
    Proceedings of the Conference on Database Systems for Business, Technology, and Web (BTW), 2021
    [Paper]  [Project Page]  [DOI:10.18420/btw2021-03]
  • Do Embeddings Actually Capture Knowledge Graph Semantics?
    Nitisha Jain, Jan-Christoph Kalo, Wolf-Tilo Balke, Ralf Krestel
    Proceedings of the Extended Semantic Web Conference (ESWC), 2021
    [Paper]  [URL]  [DOI:10.1007/978-3-030-77385-4_9]
  • Structured Object Matching across Web Page Revisions
    Tobias Bleifuß, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava
    Proceedings of the International Conference on Data Engineering (ICDE), 2021
    [Paper]  [IEEE]  [Project]  [DOI:10.1109/ICDE51399.2021.00115]
  • ComEx: Comment Exploration on Online News Platforms
    Julian Risch, Tim Repke, Lasse Kohlmeyer, Ralf Krestel
    Joint Proceedings of the ACM IUI Workshops co-located with the ACM Conference on Intelligent User Interfaces (IUI), 2021
    [Paper]  [GitHub]  [Project]  [CEUR-WS] 
  • Relational Header Discovery using Similarity Search in a Table Corpus
    Hazar Harmouch, Thorsten Papenbrock, Felix Naumann
    Proceedings of the International Conference on Data Engineering (ICDE) (2021)
  • Structure Detection in Verbose CSV Files
    Lan Jiang, Gerardo Vitagliano, Felix Naumann
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2021
    [Paper]  [GitHub]  [DOI:10.5441/002/edbt.2021.18]
  • Discovering Relaxed Functional Dependencies based on Multi-attribute Dominance
    Loredana Caruccio, Vincenzo Deufemia, Felix Naumann, Giuseppe Polese
    Transactions on Knowledge and Data Engineering (TKDE) 33:(9), 2021
    [IEEE]  [DOI:10.1109/TKDE.2020.2967722]
  • Few-Shot Knowledge Validation using Rules
    Michael Loster, Davide Mottin, Paolo Papotti, Felix Naumann, Jan Ehmueller, Benjamin Feldmann
    Proceedings of The Web Conference (WWW), 2021
  • PatentMatch: A Dataset for Matching Patent Claims with Prior Art
    Julian Risch, Nicolas Alder, Christoph Hewel, Ralf Krestel
    Proceedings of the Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech@SIGIR), 2021
    [Paper]  [Project Page]  [CEUR-WS] 
  • Robust Visualisation of Dynamic Text Collections: Measuring and Comparing Dimensionality Reduction Algorithms
    Tim Repke, Ralf Krestel
    Proceedings of the Conference on Human Information Interaction and Retrieval (CHIIR), 2021
    [Paper]  [DOI:10.1145/3406522.3446034]
  • Ein Data Engineering Kurs für 10.000 Teilnehmer
    Nicolas Alder, Tobias Bleifuß, Leon Bornemann, Felix Naumann, Tim Repke
    Datenbank-Spektrum 20:(1), 2021
    [Article]  [Springer]  [openHPI]  [DOI:10.1007/s13222-020-00354-8]
  • Knowledge Transfer for Entity Resolution with Siamese Neural Networks
    Michael Loster, Ioannis Koumarelas, Felix Naumann
    Journal of Data and Information Quality (JDIQ) 13:(1), 2021




  • Semantic Analysis of Cultural Heritage Data: Aligning Paintings and Descriptions in Art-Historic Collections
    Nitisha Jain, Christian Bartz, Tobias Bredow, Emanuel Metzenthin, Jona Otholt, Ralf Krestel
    Proceedings of the International Workshop on Fine Art Pattern Extraction and Recognition (FAPER@ICPR), 2020
    [Paper]  [Springer]  [DOI:10.1007/978-3-030-68796-0_37]
  • HyCoNN: Hybrid Cooperative Neural Networks for Personalized News Discussion Recommendation
    Julian Risch, Victor Künstler, Ralf Krestel
    Proceedings of the International Joint Conferences on Web Intelligence and Intelligent Agent Technologies (WI-IAT), 2020
    [Paper]  [GitHub]  [DOI:10.1109/WIIAT50758.2020.00011]
  • Learning Fine-Grained Semantics for Multi-Relational Data
    Nitisha Jain, Ralf Krestel
    Proceedings of the International Semantic Web Conference, Posters and Demos (ISWC), 2020
    [Paper]  [Poster] 
  • Data Preparation: A Survey of Commercial Tools
    Mazhar Hameed, Felix Naumann
    SIGMOD Record 49:(3), 2020
    [Paper]  [ACM]  [DOI:10.1145/3444831.3444835]
  • Efficient Detection of Data Dependency Violations
    Eduardo H. M. Pena, Edson R. L. Filho, Eduardo C. de Almeida, Felix Naumann
    Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2020
    [Paper]  [DOI:10.1145/3340531.3412062]
  • Hitting Set Enumeration with Partial Information for Unique Column Combination Discovery
    Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, Martin Schirneck
    PVLDB 13:(11), 2020
    [Paper]  [DOI:10.14778/3407790.3407824]
  • A Dataset of Journalists' Interactions with Their Readership: When Should Article Authors Reply to Reader Comments?
    Julian Risch, Ralf Krestel
    Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2020
    [Paper]  [GitHub]  [DOI:10.1145/3340531.3412764]
  • Dynamic Channel and Layer Gating in Convolutional Neural Networks
    Ali Ehteshami Bejnordi, Ralf Krestel
    Proceedings of the German Conference on Artificial Intelligence (KI), 2020
    [Paper]  [DOI:10.1007/978-3-030-58285-2_3]
  • Sense Tree: Discovery of New Word Senses with Graph-based Scoring
    Jan Ehmüller, Lasse Kohlmeyer, Holly McKee, Daniel Paeschke, Tim Repke, Ralf Krestel, Felix Naumann
    Lernen, Wissen, Daten, Analysen (LWDA), 2020
    [Paper]  [CEUR-WS]  [Project] 
  • Multimodal Knowledge Graphs for Semantic Analysis of Cultural Heritage Data
    Nitisha Jain
    Invited Talk at the Workshop on Knowledge Bases and Multiple Modalities (KBMM@AKBC), 2020
  • Efficient Discovery of Matching Dependencies
    Philipp Schirmer, Thorsten Papenbrock, Ioannis Koumarelas, Felix Naumann
    Transactions on Database Systems (TODS) 45:(3), 2020
    [Paper]  [DOI:10.1145/3392778]
  • Explaining Offensive Language Detection
    Julian Risch, Robin Ruff, Ralf Krestel
    Journal for Language Technology and Computational Linguistics (JLCL) 34:(1), 2020
    [Paper]  [GitHub]  [Publisher] 
  • Discovering Biased News Articles Leveraging Multiple Human Annotations
    Konstantina Lazaridou, Alexander Löser, Maria Mestre, Felix Naumann
    Proceedings of the Conference on Language Resources and Evaluation (LREC), 2020
    [Paper]  [Paper] 
  • Offensive Language Detection Explained
    Julian Risch, Robin Ruff, Ralf Krestel
    Proceedings of the Workshop on Trolling, Aggression and Cyberbullying (TRAC@LREC), 2020
    [Paper]  [GitHub]  [ACL] 
  • Hierarchical Document Classification as a Sequence Generation Task
    Julian Risch, Samuele Garda, Ralf Krestel
    Proceedings of the Joint Conference on Digital Libraries (JCDL), 2020
    [Paper]  [GitHub]  [DOI:10.1145/3383583.3398538]
  • RHEEMix in the Data Jungle: A Cost-based Optimizer for Cross-Platform Systems
    Sebastian Kruse, Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Sanjay Chawla, Felix Naumann, Bertty Contreras-Rojas
    The VLDB Journal 29:(6), 2020
    [URL]  [DOI:10.1007/s00778-020-00612-x]
  • Bagging BERT Models for Robust Aggression Identification
    Julian Risch, Ralf Krestel
    Proceedings of the Workshop on Trolling, Aggression and Cyberbullying (TRAC@LREC), 2020
    [Paper]  [GitHub] 
  • Domain-Specific Knowledge Graph Construction for Semantic Analysis
    Nitisha Jain
    Proceedings of the Extended Semantic Web Conference (ESWC), 2020
    [Paper]  [URL]  [DOI:10.1007/978-3-030-62327-2_40]
  • Automatic Matching of Paintings and Descriptions in Art-Historic Archives using Multimodal Analysis
    Nitisha Jain, Christian Bartz, Ralf Krestel
    Proceedings of the International Workshop on Artificial Intelligence for Historical Image Enrichment and Access (AI4HI@LREC), 2020
    [Paper]  [URL] 
  • Top Comment or Flop Comment? Predicting and Explaining User Engagement in Online News Discussions
    Julian Risch, Ralf Krestel
    Proceedings of the International Conference on Web and Social Media (ICWSM), 2020
    [Paper]  [GitHub] 
  • Visualising Large Document Collections by Jointly Modeling Text and Network Structure
    Tim Repke, Ralf Krestel
    Proceedings of the Joint Conference on Digital Libraries (JCDL), 2020
    [Paper]  [Project]  [DOI:10.1145/3383583.3398524]
  • Exploration Interface for Jointly Visualised Text and Graph Data
    Tim Repke, Ralf Krestel
    Proceedings of the International Conference on Intelligent User Interfaces Companion (IUI), 2020
    [Paper]  [Project]  [DOI:10.1145/3379336.3381470]
  • Natural Key Discovery in Wikipedia Tables
    Leon Bornemann, Tobias Bleifuß, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava
    Proceedings of The Web Conference (WWW), 2020
    [Paper]  [DOI:10.1145/3366423.3380039]
  • Data Preparation for Duplicate Detection
    Ioannis Koumarelas, Lan Jiang, Felix Naumann
    Journal of Data and Information Quality (JDIQ) 12:(3), 2020
  • Explainable AI under Contract and Tort Law: Legal Incentives and Technical Challenges
    Philipp Hacker, Ralf Krestel, Stefan Grundmann, Felix Naumann
    Artificial Intelligence and Law 28:(4), 2020
    [Paper]  [DOI:10.1007/s10506-020-09260-6]
  • MDedup: Duplicate Detection with Matching Dependencies
    Ioannis Koumarelas, Thorsten Papenbrock, Felix Naumann
    PVLDB 13:(5), 2020
    [Paper]  [DOI:10.14778/3377369.3377379]
  • Holistic Primary Key and Foreign Key Detection
    Lan Jiang, Felix Naumann
    Journal of Intelligent Information Systems 54:(3), 2020
    [Paper]  [DOI:10.1007/s10844-019-00562-z]
  • Toxic Comment Detection in Online Discussions
    Julian Risch, Ralf Krestel
    Deep Learning-Based Approaches for Sentiment Analysis. Springer, 2020
    [Paper]  [DOI:10.1007/978-981-15-1216-2]




  • An Actor Database System for Akka
    Sebastian Schmidl, Frederic Schneider, Thorsten Papenbrock
    Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW) - Workshopband, 2019
    [Paper]  [DOI:10.18420/btw2019-ws-23]
  • Coverage of Information Extraction from Sentences and Paragraphs
    Simon Razniewski, Nitisha Jain, Paramita Mirza, Gerhard Weikum
    Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019
    [Paper]  [ACL Web]  [DOI:10.18653/v1/D19-1583]
  • Discovery of Approximate (and Exact) Denial Constraints
    Eduardo H. M. Pena, Eduardo C. de Almeida, Felix Naumann
    PVLDB 13:(3), 2019
    [Paper]  [DOI:10.14778/3368289.3368293]
  • hpiDEDIS at GermEval 2019: Offensive Language Identification using a German BERT model
    Julian Risch, Anke Stoll, Marc Ziegele, Ralf Krestel
    Proceedings of the Conference on Natural Language Processing (KONVENS), 2019
    [Paper]  [GitHub] 
  • A Scoring-based Approach for Data Preparator Suggestion
    Lan Jiang, Gerardo Vitagliano, Felix Naumann
    Lernen, Wissen, Daten, Analysen (LWDA), 2019
  • Inclusion Dependency Discovery: An Experimental Evaluation of Thirteen Algorithms
    Falco Dürsch, Axel Stebner, Fabian Windheuser, Maxi Fischer, Tim Friedrich, Nils Strelow, Tobias Bleifuß, Hazar Harmouch, Lan Jiang, Thorsten Papenbrock, Felix Naumann
    Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2019
    [Paper]  [Code]  [DOI:10.1145/3357384.3357916]
  • Transforming Pairwise Duplicates to Entity Clusters for High Quality Duplicate Detection
    Uwe Draisbach, Peter Christen, Felix Naumann
    Journal of Data and Information Quality (JDIQ) 12:(1), 2019
    [Paper]  [DOI:10.1145/3352591]
  • Who is Mona L.? Identifying Mentions of Artworks in Historical Archives
    Nitisha Jain, Ralf Krestel
    International Conference on Theory and Practice of Digital Libraries (TPDL), 2019
    [Paper]  [Springer]  [DOI:10.1007/978-3-030-30760-8_10]
  • Mining Business Relationships from Stocks and News
    Thomas Kellermeier, Tim Repke, Ralf Krestel
    Proceedings of the Workshop on Mining Data for Financial Applications (MIDAS@ECML-PKDD), 2019
    [Paper]  [DOI:10.1007/978-3-030-37720-5_6]
  • DynFD: Functional Dependency Discovery in Dynamic Datasets
    Philipp Schirmer, Thorsten Papenbrock, Sebastian Kruse, Felix Naumann, Dennis Hempfing, Torben Mayer, Daniel Neuschäfer-Rube
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2019
    [Paper]  [DOI:10.5441/002/edbt.2019.23]
  • Measuring and Facilitating Data Repeatability in Web Science
    Julian Risch, Ralf Krestel
    Datenbank-Spektrum 19:(2), 2019
    [Paper]  [GitHub]  [DOI:10.1007/s13222-019-00316-9]
  • Domain-specific word embeddings for patent classification
    Julian Risch, Ralf Krestel
    Data Technologies and Applications 53:(1), 2019
    [Paper]  [Project Page]  [DOI:10.1108/DTA-01-2019-0002]
  • The relational database management systems genealogy
    Felix Naumann
    Making Databases Work. ACM / Morgan & Claypool, 2019
    [Paper]  [DOI:10.1145/3226595.3226611]
  • Optimizing Cross-Platform Data Movement
    Sebastian Kruse, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, Sanjay Chawla, Felix Naumann, Bertty Contreras-Rojas
    Proceedings of the International Conference on Data Engineering (ICDE), 2019
    [Paper]  [DOI:10.1109/ICDE.2019.00162]
  • DBChEx: Interactive Exploration of Data and Schema Change
    Tobias Bleifuß, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava
    Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2019
    [Paper]  [CIDRDB] 




  • CurEx: A System for Extracting, Curating, and Exploring Domain-Specific Knowledge Graphs from Text
    Michael Loster, Felix Naumann, Jan Ehmueller, Benjamin Feldmann
    Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2018
    [Paper]  [DOI:10.1145/3269206.3269229]
  • Dissecting Company Names using Sequence Labeling
    Michael Loster, Manuel Hegner, Felix Naumann, Ulf Leser
    Lernen, Wissen, Daten, Analysen (LWDA), 2018
    [Paper]  [Paper] 
  • Towards Progressive Search-driven Entity Resolution
    Alberto Pietrangelo, Giovanni Simonini, Sonia Bergamaschi, Felix Naumann, Ioannis Koumarelas
    Italian Symposium on Advanced Database Systems (SEBD), 2018
    [Paper]  [Paper] 
  • Experience: Enhancing Address Matching with Geocoding and Similarity Measure Selection
    Ioannis Koumarelas, Axel Kroschk, Clifford Mosley, Felix Naumann
    Journal of Data and Information Quality (JDIQ) 10:(2), 2018
    [Paper]  [DOI:10.1145/3232852]
  • The Challenges of Creating, Maintaining and Exploring Graphs of Financial Entities
    Michael Loster, Tim Repke, Ralf Krestel, Felix Naumann, Jan Ehmueller, Benjamin Feldmann, Oliver Maspfuhl
    Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling (DSMM), 2018
    [Paper]  [DOI:10.1145/3220547.3220553]
  • Data Profiling
    Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock
    Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2018
    [M&C]  [DOI:10.2200/S00878ED1V01Y201810DTM052]
  • Exploring Change - A New Dimension of Data Analytics
    Tobias Bleifuß, Leon Bornemann, Theodore Johnson, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava
    PVLDB 12:(2), 2018
    [Paper]  [PVLDB]  [DOI:10.14778/3282495.3282496]
  • Book Recommendation Beyond the Usual Suspects: Embedding Book Plots Together with Place and Time Information
    Julian Risch, Samuele Garda, Ralf Krestel
    Proceedings of the International Conference On Asia-Pacific Digital Libraries (ICADL), 2018
    [Paper]  [GitHub]  [DOI:10.1007/978-3-030-04257-8_24]
  • Fine-Grained Classification of Offensive Language
    Julian Risch, Eva Krebs, Alexander Löser, Alexander Riese, Ralf Krestel
    Proceedings of GermEval (co-located with KONVENS), 2018
  • Learning Patent Speak: Investigating Domain-Specific Word Embeddings
    Julian Risch, Ralf Krestel
    Proceedings of the International Conference on Digital Information Management (ICDIM), 2018
    [Paper]  [Project Page]  [DOI:10.1109/ICDIM.2018.8846972]
  • Challenges for Toxic Comment Classification: An In-Depth Error Analysis
    Betty van Aken, Julian Risch, Ralf Krestel, Alexander Löser
    Proceedings of the Workshop on Abusive Language Online (ALW@EMNLP), 2018
    [Paper]  [DOI:10.18653/v1/w18-5105]
  • Beacon in the Dark: A System for Interactive Exploration of Large Email Corpora
    Tim Repke, Ralf Krestel, Jakob Edding, Moritz Hartmann, Jonas Hering, Dennis Kipping, Hendrik Schmidt, Nico Scordialo, Alexander Zenner
    Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2018
    [Paper v1]  [Paper v2]  [Project]  [DOI:10.1145/3269206.3269231]
  • RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! -
    Divy Agrawal, Sanjay Chawla, Zoi Kaoudi, Sebastian Kruse, Jorge Arnulfo Quiané-Ruiz, Bertty Contreras-Rojas, Ahmed Elmagarmid, Yasser Idris, Ji Lucas, Essam Mansour, Mourad Ouzzani, Paolo Papotti, Nan Tang, Saravanan Thirumuruganathan, Anis Troudi
    PVLDB 11:(11), 2018
    [Paper]  [DOI:10.14778/3236187.3236195]
  • Piggyback Profiling: Enhancing Query Results with Metadata
    Claudia Exeler, Maria Graber, Tino Junge, Stefan Ramson, Cathleen Ramson, Fabian Tschirschnitz, Felix Naumann
    Lernen, Wissen, Daten, Analysen (LWDA), 2018
  • Aggression Identification Using Deep Learning and Data Augmentation
    Julian Risch, Ralf Krestel
    Proceedings of the Workshop on Trolling, Aggression and Cyberbullying (TRAC@COLING), 2018
    [Paper]  [GitHub] 
  • Delete or not Delete? Semi-Automatic Comment Moderation for the Newsroom
    Julian Risch, Ralf Krestel
    Proceedings of the Workshop on Trolling, Aggression and Cyberbullying (TRAC@COLING), 2018
  • Data Change Exploration using Time Series Clustering
    Leon Bornemann, Tobias Bleifuß, Dmitri Kalashnikov, Felix Naumann, Divesh Srivastava
    Datenbank-Spektrum 18:(2), 2018
    [Paper]  [DOI:10.1007/s13222-018-0285-x]
  • WELDA: Enhancing Topic Models by Incorporating Local Word Contexts
    Stefan Bunk, Ralf Krestel
    Proceedings of the Joint Conference on Digital Libraries (JCDL), 2018
    [Paper]  [DOI:10.1145/3197026.3197043]
  • Prediction for the Newsroom: Which Articles Will Get the Most Comments?
    Carl Ambroselli, Julian Risch, Ralf Krestel, Andreas Loos
    Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2018
    [Paper]  [GitHub]  [DOI:10.18653/v1/n18-3024]
  • Where in the World Is Carmen Sandiego? Detecting Person Locations via Social Media Discussions
    Konstantina Lazaridou, Toni Gruetze, Felix Naumann
    Proceedings of the ACM Conference on Web Science (WebSci), 2018
    [Paper]  [URL]  [DOI:10.1145/3201064.3201068]
  • Efficient Discovery of Approximate Dependencies
    Sebastian Kruse, Felix Naumann
    PVLDB 11:(7), 2018
    [Paper]  [Errata]  [DOI:10.14778/3192965.3192968]
  • My Approach = Your Apparatus? Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections
    Julian Risch, Ralf Krestel
    Proceedings of the Joint Conference on Digital Libraries (JCDL), 2018
    [Paper]  [GitHub]  [arXiv]  [DOI:10.1145/3197026.3197038]
  • Discovery of Genuine Functional Dependencies from Relational Data with Missing Values
    Laure Berti-Equille, Hazar Harmouch, Felix Naumann, Noel Novelli, Saravanan Thirumuruganathan
    PVLDB, 2018
    [Paper]  [Paper]  [DOI:10.14778/3204028.3204032]
  • Topic-aware Network Visualisation to Explore Large Email Corpora
    Tim Repke, Ralf Krestel
    International Workshop on Big Data Visual Exploration and Analytics (BigVis), 2018
    [Paper]  [Project] 
  • Data Quality – The Role of Empiricism
    Shazia Sadiq, Tamraparni Dasu, Xin Luna Dong, Juliana Freire, Ihab F. Ilyas, Sebastian Link, Renée J. Miller, Felix Naumann, Xiaofang Zhou, Divesh Srivastava
    SIGMOD Record 46:(4), 2018
    [Paper]  [DOI:10.1145/3186549.3186559]
  • Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks
    Tim Repke, Ralf Krestel
    Proceedings of the European Conference on Information Retrieval (ECIR), 2018
    [Paper]  [Project]  [DOI:10.1007/978-3-319-76941-7_9]




  • Metacrate: Organize and Analyze Millions of Data Profiles
    Sebastian Kruse, David Hahn, Marius Walter, Felix Naumann
    Proceedings of the ACM on Conference on Information and Knowledge Management (CIKM), 2017
    [Paper]  [DOI:10.1145/3132847.3133180]
  • Cardinality Estimation: An Experimental Survey
    Hazar Harmouch, Felix Naumann
    PVLDB, 2017
    [Paper]  [Paper]  [DOI:10.1145/3164135.3164145]
  • Detecting Inclusion Dependencies on Very Many Tables
    Fabian Tschirschnitz, Thorsten Papenbrock, Felix Naumann
    Transactions on Database Systems (TODS) 42:(3), 2017
    [Paper]  [DOI:10.1145/3105959]
  • ssHMM: Extracting Intuitive Sequence-Structure Motifs from High-Throughput RNA-Binding Protein Data
    David Heller, Ralf Krestel, Uwe Ohler, Martin Vingron, Annalisa Marsico
    Nucleic Acid Research 45:(19), 2017
  • Efficient Denial Constraint Discovery with Hydra
    Tobias Bleifuß, Sebastian Kruse, Felix Naumann
    PVLDB 11:(3), 2017
    [Paper]  [PVLDB]  [DOI:10.14778/3157794.3157800]
  • Effect of a Website That Presents Patients' Experiences on Self-Efficacy and Patient Competence of Colorectal Cancer Patients: Web-Based Randomized Controlled Trial
    M. Jürgen Giesler, Bettina Keller, Tim Repke, Rainer Leonhart, Joachim Weis, Rebecca Muckelbauer, Nina Rieckmann, Jacqueline Müller-Nordhorn, Gabriele Lucius-Hoene, Christine Holmberg
    Journal of Medical Internet Research (JMIR) 19:(10), 2017
    [JMIR]  [DOI:10.2196/jmir.7639]
  • Identifying Media Bias by Analyzing Reported Speech
    Konstantina Lazaridou, Ralf Krestel, Felix Naumann
    Proceedings of the International Conference on Data Mining (ICDM), 2017
    [IEEE]  [DOI:10.1109/ICDM.2017.119]
  • Real or Fake? Large-Scale Validation of Identity Leaks
    Fabian Maschler, Fabio Niephaus, Julian Risch
    Jahrestagung der Gesellschaft für Informatik (INFORMATIK), 2017
    [Paper]  [DOI:10.18420/in2017_248]
  • Uncovering Business Relationships: Context-sensitive Relationship Extraction for Difficult Relationship Types
    Zhe Zuo, Michael Loster, Ralf Krestel, Felix Naumann
    Lernen, Wissen, Daten, Analysen (LWDA), 2017
  • How Do Search Engines Work? A Massive Open Online Course with 4000 Participants
    Ralf Krestel, Julian Risch
    Lernen, Wissen, Daten, Analysen (LWDA), 2017
  • Improving Company Recognition from Unstructured Text by using Dictionaries
    Michael Loster, Zhe Zuo, Felix Naumann, Oliver Maspfuhl, Dirk Thomas
    Proceedings of the International Conference on Extending Database Technology, 2017
    [Paper]  [DOI:10.5441/002/edbt.2017.82]
  • What Should I Cite? Cross-Collection Reference Recommendation of Patents and Papers
    Julian Risch, Ralf Krestel
    Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL), 2017
    [Paper]  [GitHub] 
  • Enabling Change Exploration (Vision)
    Tobias Bleifuß, Theodore Johnson, Dmitri V. Kalashnikov, Felix Naumann, Vladislav Shkapenyuk, Divesh Srivastava
    Proceedings of the Fourth International Workshop on Exploratory Search in Databases and the Web (ExploreDB), 2017
    [Paper]  [DOI:10.1145/3077331.3077340]
  • Fast Approximate Discovery of Inclusion Dependencies
    Sebastian Kruse, Thorsten Papenbrock, Christian Dullweber, Moritz Finke, Manuel Hegner, Martin Zabel, Christian Zöllner, Felix Naumann
    Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2017
  • A Hybrid Approach for Efficient Unique Column Combination Discovery
    Thorsten Papenbrock, Felix Naumann
    Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2017
  • Data-driven Schema Normalization
    Thorsten Papenbrock, Felix Naumann
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2017
    [Paper]  [DOI:10.5441/002/edbt.2017.31]
  • Das Fachgebiet „Informationssysteme“ am Hasso-Plattner-Institut
    Felix Naumann, Ralf Krestel
    Datenbank-Spektrum 17:(1), 2017
    [Paper]  [URL] 
  • What was Hillary Clinton doing in Katy, Texas?
    Toni Gruetze, Ralf Krestel, Konstantina Lazaridou, Felix Naumann
    Proceedings of the International Conference on World Wide Web (WWW), 2017
  • Comparing Features for Ranking Relationships Between Financial Entities Based on Text
    Tim Repke, Michael Loster, Ralf Krestel
    Proceedings of the International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets (DSMM), 2017
    [Paper]  [Poster]  [Slides]  [DOI:10.1145/3077240.3077252]
  • Data Profiling (tutorial)
    Ziawasch Abedjan, Lukasz Golab, Felix Naumann
    Proceedings of the International Conference on Management of Data (SIGMOD), 2017




  • Biterm pseudo document topic model for short text
    Lan Jiang, Hengyang Lu, Ming Xu, Chongjun Wang
    Proceedings of the International Conference on Tools with Artificial Intelligence (ICTAI), 2016
    [Paper]  [IEEE]  [DOI:10.1109/ICTAI.2016.0134]
  • Extraction Of Citation Data From Websites Based On Visual Cues
    Tim Repke
    , 2016
  • Cluster-based Sorted Neighborhood for Efficient Duplicate Detection
    Ahmad Samiei, Felix Naumann
    International Conference on Data Mining Workshops (ICDMW), 2016
  • Approximate Discovery of Functional Dependencies for Large Datasets
    Tobias Bleifuß, Susanne Bülow, Johannes Frohnhofen, Julian Risch, Georg Wiese, Sebastian Kruse, Thorsten Papenbrock, Felix Naumann
    Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2016
    [Paper]  [DOI:10.1145/2983323.2983781]
  • Rheem: Enabling Multi-Platform Task Execution (demo)
    Divy Agrawal, Lamine Ba, Laure Berti-Equille, Sanjay Chawla, Ahmed Elmagarmid, Hossam Hammady, Yasser Idris, Zoi Kaoudi, Zuhair Khayyat, Sebastian Kruse, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Mohammed J. Zaki
    Proceedings of the ACM Conference on Management of Data (SIGMOD), 2016
  • Combination of Rule-based and Textual Similarity Approaches to Match Financial Entities
    Ahmad Samiei, Ioannis Koumarelas, Michael Loster, Felix Naumann
    Data Science for Macro-Modeling with Financial and Economic Datasets (DSMM), 2016
    [Paper]  [URL] 
  • Holistic Data Profiling: Simultaneous Discovery of Various Metadata
    Jens Ehrlich, Mandy Roick, Lukas Schulze, Jakob Zwiener, Thorsten Papenbrock, Felix Naumann
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2016
    [Paper]  [Paper] 
  • Classification of German Newspaper Comments
    Christian Godde, Konstantina Lazaridou, Ralf Krestel
    Lernen, Wissen, Daten, Analysen (LWDA), 2016
  • Identifying Political Bias in News Articles
    Konstantina Lazaridou, Ralf Krestel
    International Conference on Theory and Practice of Digital Libraries. IEEE Technical Committee on Digital Libraries, 2016
  • RDFind: Scalable Conditional Inclusion Dependency Discovery in RDF Datasets
    Sebastian Kruse, Anja Jentzsch, Thorsten Papenbrock, Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Felix Naumann
    Proceedings of the International Conference on Management of Data (SIGMOD), 2016
    [Paper]  [DOI:10.1145/2882903.2915206]
  • Data Anamnesis: Admitting Raw Data into an Organization
    Sebastian Kruse, Thorsten Papenbrock, Hazar Harmouch, Felix Naumann
    Data Engineering Bulletin 39:(2), 2016
  • A Hybrid Approach to Functional Dependency Discovery
    Thorsten Papenbrock, Felix Naumann
    Proceedings of the International Conference on Management of Data (SIGMOD), 2016
    [Paper]  [DOI:10.1145/2882903.2915203]
  • TextAI: Enhancing TextAE with Intelligent Annotation Support
    Maximilian Grundke, Johannes Jasper, Mariya Perchyk, Jan Philipp Sachse, Ralf Krestel, Mariana Neves
    Proceedings of the International Symposium on Semantic Mining in Biomedicine (SMBM), 2016
    [Paper]  [DOI:10.1007/978-3-319-41754-7_18]
  • Analyzing NIH Funding Patterns over Time with Statistical Text Analysis
    Jihyun Park, Margaret Blume-Kohout, Ralf Krestel, Eric Nalisnick, Padhraic Smyth
    Scholarly Big Data: AI Perspectives, Challenges, and Ideas (SBD) Workshop at AAAI, 2016
  • Proceedings of the Conference "Lernen, Wissen, Daten, Analysen", Potsdam, Germany, September 12-14, 2016
    Ralf Krestel, Davide Mottin, Emmanuel Müller
    CEUR Workshop Proceedings. CEUR-WS.org, 2016
  • Which Answer is Best? Predicting Accepted Answers in MOOC Forums
    Maximilian Jenders, Ralf Krestel, Felix Naumann
    Proceedings of the International Conference Companion on World Wide Web, 2016
  • Topic Shifts in StackOverflow: Ask it like Socrates
    Toni Gruetze, Ralf Krestel, Felix Naumann
    Lecture Notes in Computer Science, 2016
    [Paper]  [DOI:10.1007/978-3-319-41754-7_18]
  • The Information Systems Group at HPI
    Felix Naumann, Ralf Krestel
    SIGMOD Record (2016)
  • Using others’ experiences. Cancer patients’ expectations and navigation of a website providing narratives on prostate, breast and colorectal cancer
    Jennifer Engler, Sandra Adami, Yvonne Adam, Bettina Keller, Tim Repke, Hella Fügemann, Gabriele Lucius-Hoene, Jacqueline Müller-Nordhorn, Christine Holmberg
    Patient Education and Counseling 99:(8), 2016
    [ScienceDirect]  [DOI:10.1016/j.pec.2016.03.015]
  • CohEEL: Coherent and Efficient Named Entity Linking through Random Walks
    Toni Gruetze, Gjergji Kasneci, Zhe Zuo, Felix Naumann
    Web Semantics: Science, Services and Agents on the World Wide Web 37:(C), 2016
    [Paper]  [DOI:10.1016/j.websem.2016.03.001]
  • Efficient Order Dependency Discovery
    Philipp Langer, Felix Naumann
    The VLDB Journal 25:(2), 2016
  • Data Profiling (tutorial)
    Lukasz Golab Ziawasch Abedjan, Felix Naumann
    Proceedings of the International Conference on Data Engineering (ICDE), 2016




  • Social Media Story Telling
    Patrick Hennig, Philipp Berger, Christian Dullweber, Moritz Finke, Fabian Maschler, Julian Risch, Christoph Meinel
    Proceedings of the International Conference on Social Computing and Networking (SocialCom), 2015
    [Paper]  [DOI:10.1109/SmartCity.2015.84]
  • Ergonomic Interaction for Touch Floors
    Dominik Schmidt, Johannes Frohnhofen, Sven Knebel, Florian Meinel, Mariya Perchyk, Julian Risch, Jonathan Striebel, Julia Wachtel, Patrick Baudisch
    Proceedings of the Conference on Human Factors in Computing Systems (CHI), 2015
    [Paper]  [DOI:10.1145/2702123.2702254]
  • Tweet-Recommender: Finding Relevant Tweets for News Articles
    Ralf Krestel, Thomas Werkmeister, Timur Pratama Wiradarma, Gjergji Kasneci
    Proceedings of the International World Wide Web Conference (WWW), 2015
  • Progressive Duplicate Detection
    Thorsten Papenbrock, Arvid Heise, Felix Naumann
    IEEE Transactions on Knowledge and Data Engineering (TKDE) 27:(5), 2015
    [Paper]  [DOI:10.1109/TKDE.2014.2359666]
  • Scaling Out the Discovery of Inclusion Dependencies
    Sebastian Kruse, Thorsten Papenbrock, Felix Naumann
    Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2015
  • Divide & Conquer-based Inclusion Dependency Discovery
    Thorsten Papenbrock, Sebastian Kruse, Jorge-Arnulfo Quiane-Ruiz, Felix Naumann
    PVLDB 8:(7), 2015
    [Paper]  [DOI:10.14778/2752939.2752946]
  • Data Profiling with Metanome
    Thorsten Papenbrock, Tanja Bergmann, Moritz Finke, Jakob Zwiener, Felix Naumann
    PVLDB 8:(12), 2015
    [Paper]  [DOI:10.14778/2824032.2824086]
  • Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms
    Thorsten Papenbrock, Jens Ehrlich, Jannik Marten, Tommy Neubert, Jan-Peer Rudolph, Martin Schönberg, Jakob Zwiener, Felix Naumann
    PVLDB 8:(10), 2015
    [Paper]  [DOI:10.14778/2794367.2794377]
  • Diversifying Customer Review Rankings
    Ralf Krestel, Nima Dokoohaki
    Neural Networks (2015)
  • Online Temporal Summarization of News Events
    Tobias Schubotz, Ralf Krestel
    Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015
  • Learning Temporal Tagging Behaviour
    Toni Gruetze, Gary Yao, Ralf Krestel
    Proceedings of the International Conference on World Wide Web Companion (WWW), 2015
    [Paper]  [DOI:10.1145/2740908.2741701]
  • How to Stay Up-to-date on Twitter with General Keywords
    Mandy Roick, Maximilian Jenders, Ralf Krestel
    Proceedings of the LWA Workshops: KDML, FGWM, IR, and FGDB, 2015
  • A Serendipity Model For News Recommendation
    Maximilian Jenders, Thorben Lindhauer, Gjergji Kasneci, Ralf Krestel, Felix Naumann
    KI: Advances in Artificial Intelligence - Annual German Conference on AI, 2015
  • Profiling relational data: a survey
    Ziawasch Abedjan, Lukasz Golab, Felix Naumann
    The VLDB Journal 24:(4), 2015
    [Paper]  [DOI:10.1007/s00778-015-0389-y]
  • Uniqueness, Density, and Keyness: Exploring Class Hierarchies
    Anja Jentzsch, Hannes Mühleisen, Felix Naumann
    In Proceedings of International Workshop on Consuming Linked Data (COLD), ISWC, 2015
  • Exploring Linked Data Graph Structures
    Anja Jentzsch, Christian Dullweber, Pierpaolo Troiano, Felix Naumann
    Proceedings of the International Semantic Web Conference, Posters and Demos (ISWC), 2015
  • SOFA: An Extensible Logical Optimizer for UDF-heavy Data Flows
    Astrid Rheinländer, Arvid Heise, Fabian Hueske, Ulf Leser, Felix Naumann
    Information Systems (2015)
  • Estimating Data Integration and Cleaning Effort
    Sebastian Kruse, Paolo Papotti, Felix Naumann
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2015




  • Multi-label emotion classification for tweets in weibo: Method and application
    Jun Yang, Lan Jiang, Chongjun Wang, Junyuan Xie
    Proceedings of the International Conference on Tools with Artificial Intelligence (ICTAI), 2014
    [IEEE]  [DOI:10.1109/ICTAI.2014.71]
  • Versatile optimization of UDF-heavy data flows with SOFA
    Astrid Rheinländer, Martin Beckmann, Anja Kunkel, Arvid Heise, Thomas Stoltmann, Ulf Leser
    Proceedings of the International Conference on Management of Data (SIGMOD), 2014
    [Paper]  [DOI:10.1145/2588555.2594517]
  • The Stratosphere Platform for Big Data Analytics
    Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinländer, Matthias J. Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, Daniel Warneke
    The VLDB Journal 23:(6), 2014
  • LODOP - Multi-Query Optimization for Linked Data Profiling Queries
    Benedikt Forchhammer, Anja Jentzsch, Felix Naumann
    Proceedings of the Extended Semantic Web Conference (ESWC), 2014
  • Modeling human newspaper readers: The Fuzzy Believer approach
    Ralf Krestel, Sabine Bergler, René Witte
    Natural Language Engineering 20:(2), 2014
    [Paper]  [DOI:10.1017/S1351324912000289]
  • Detecting Unique Column Combinations on Dynamic Data
    Ziawasch Abedjan, Jorge-Arnulfo Quanie-Ruiz, Felix Naumann
    Proceedings of the International Conference on Data Engineering (ICDE), 2014
  • Data Perspective in Process Choreographies: Modeling and Execution
    Andreas Meyer, Luise Pufahl, Kimon Batoulis, Sebastian Kruse, Thorben Lindhauer, Thomas Stoff, Dirk Fahland, Mathias Weske
    International Conference on Advanced Information Systems Engineering, 2014
  • Assigning Global Relevance Scores to DBpedia Facts
    Philipp Langer, Patrick Schulze, Stefan George, Matthias Kohnen, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci
    International Workshop on Data Engineering meets the Semantic Web (DESWeb), 2014
  • Bootstrapping Wikipedia to Answer Ambiguous Person Name Queries
    Toni Gruetze, Gjergji Kasneci, Zhe Zuo, Felix Naumann
    International Workshop on Information Integration on the Web (IIWeb), 2014
  • DFD: Efficient Discovery of Functional Dependencies
    Ziawasch Abedjan, Patrick Schulze, Felix Naumann
    In Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2014
  • Profiling and Mining RDF Data with ProLOD++
    Ziawasch Abedjan, Toni Gruetze, Anja Jentzsch, Felix Naumann
    Proceedings of the International Conference on Data Engineering (ICDE), 2014
  • Identifying and Determining SPARQL Endpoint Characteristics
    Johannes Lorey
    International Journal of Web Information Systems 10:(3), 2014
  • Semi-Supervised Consensus Clustering: Reducing Human Effort
    Tobias Vogel, Felix Naumann
    Proceedings of the International Workshop on Data Integration and Applications, 2014
  • DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia
    Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, Christian Bizer
    Semantic Web Journal (2014)
  • BEL: Bagging for Entity Linking
    Zhe Zuo, Gjergji Kasneci, Toni Gruetze, Felix Naumann
    25th International Conference on Computational Linguistics (COLING), 2014
  • Estimating the Number and Sizes of Fuzzy-Duplicate Clusters
    Arvid Heise, Gjergji Kasneci, Felix Naumann
    Proceedings of the Conference on Information and Knowledge Management (CIKM), 2014
  • Amending RDF Entities with New Facts
    Ziawasch Abedjan, Felix Naumann
    Proceedings of the Extended Semantic Web Conference (ESWC), 2014
  • Reach for Gold: An Annealing Standard to Evaluate Duplicate Detection Results
    Tobias Vogel, Arvid Heise, Uwe Draisbach, Dustin Lange, Felix Naumann
    Journal of Data and Information Quality (JDIQ) 5:(1-2), 2014




  • Storing and Provisioning Linked Data as a Service
    Johannes Lorey
    Proceedings of the Extended Semantic Web Conference (ESWC), 2013
  • Improving RDF Data through Association Rule Mining
    Ziawasch Abedjan, Felix Naumann
    Datenbank-Spektrum (Special Issue on RDF Data Management) 13:(2), 2013
  • Detecting SPARQL Query Templates for Data Prefetching
    Johannes Lorey, Felix Naumann
    Proceedings of the Extended Semantic Web Conference (ESWC), 2013
  • Caching and Prefetching Strategies for SPARQL Queries
    Johannes Lorey, Felix Naumann
    Proceedings of the Extended Semantic Web Conference (ESWC), 2013
  • Analyzing and Predicting Viral Tweets
    Maximilian Jenders, Gjergji Kasneci, Felix Naumann
    Proceedings of the International World Wide Web Conference (WWW), 2013
  • Applying Stratosphere for Big Data Analytics
    Marcus Leich, Jochen Adamek, Moritz Schubotz, Arvid Heise, Astrid Rheinlander, Volker Markl
    Database Systems for Business, Technology, and Web (BTW), 2013
  • Topic modeling for expert finding using latent dirichlet allocation
    Saeedeh Momtazi, Felix Naumann
    WIREs Data Mining and Knowledge Discovery 3:(5), 2013
  • Synonym Analysis for Predicate Expansion
    Ziawasch Abedjan, Felix Naumann
    Proceedings of the Extended Semantic Web Conference (ESWC), 2013
  • SPARQL Endpoint Metrics for Quality-Aware Linked Data Consumption
    Johannes Lorey
    Proceedings of the International Conference on Information Integration and Web-based Applications & Services (iiWAS), 2013
  • Cross-lingual Entity Matching and Infobox Alignment in Wikipedia
    Daniel Rinser, Dustin Lange, Felix Naumann
    Information Systems (IS) 38:(6), 2013
  • Ein Datenbankkurs mit 6000 Teilnehmern - Erfahrungen auf der openHPI MOOC Plattform
    Felix Naumann, Maximilian Jenders, Thorsten Papenbrock
    Informatik-Spektrum 37:(12), 2013
    [Paper]  [DOI:10.1007/s00287-013-0750-8]
  • Duplicate Detection on GPUs
    Benedikt Forchhammer, Thorsten Papenbrock, Thomas Stening, Sven Viehmeier, Uwe Draisbach, Felix Naumann
    Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2013
  • SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases
    Simon Lacoste-Julien, Konstantina Palla, Alex Davies, Gjergji Kasneci, Thore Graepel, Zoubin Ghahramani
    Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2013
  • Scalable Discovery of Unique Column Combinations
    Arvid Heise, Jorge-Arnulfo Quiane-Ruiz, Ziawasch Abedjan, Anja Jentzsch, Felix Naumann
    PVLDB, 2013
    [Paper]  [Slides]  [doi] 
  • Caching and Prefetching Strategies for SPARQL Queries
    Johannes Lorey, Felix Naumann
    Proceedings of the International Workshop on Usage Analysis and the Web of Data (USEWOD), 2013
  • Cost-Aware Query Planning for Similarity Search
    Dustin Lange, Felix Naumann
    Information Systems (IS) 38:(4), 2013
  • Bulk Sorted Access for Efficient Top-k Retrieval
    Dustin Lange, Felix Naumann
    Proceedings of the International Conference on Scientific and Statistical Database Management (SSDBM), 2013
  • Systematic ETL Management – Experiences with High-Level Operators
    Alexander Albrecht, Felix Naumann
    Proceedings of the International Conference on Information Quality (ICIQ), 2013
  • SOFA: An Extensible Logical Optimizer for UDF-heavy Dataflows
    Astrid Rheinländer, Arvid Heise, Fabian Hueske, Ulf Leser, Felix Naumann
    , 2013
  • On Choosing Thresholds for Duplicate Detection
    Uwe Draisbach, Felix Naumann
    Proceedings of the International Conference on Information Quality (ICIQ), 2013
  • Data Profiling Revisited
    Felix Naumann
    SIGMOD Record 32:(4), 2013




  • Efficient Similarity Search in Very Large String Sets
    Dandy Fenz, Dustin Lange, Astrid Rheinländer, Felix Naumann, Ulf Leser
    Proceedings of the International Conference on Scientific and Statistical DatabaseManagement (SSDBM), 2012
  • Schema Decryption for Large Extract-Transform-Load Systems
    Alexander Albrecht, Felix Naumann
    Proceedings of the International Conference on Conceptual Modeling (ER), 2012
  • Integrating Open Government Data with Stratosphere for more Transparency
    Arvid Heise, Felix Naumann
    Web Semantics: Science, Services and Agents on the World Wide Web 14:(1), 2012
    [Paper]  [DOI:10.1016/j.websem.2012.02.002]
  • The Data Analytics Group at the Qatar Computing Research Institute
    George Beskales, Gautam Das, Ahmed K. Elmagarmid, Ihab F. Ilyas, Felix Naumann, Mourad Ouzzani, Paolo Papotti, Jorge Quiane-Ruiz, Nan Tang
    SIGMOD Record 41:(4), 2012
  • Automatic Blocking Key Selection for Duplicate Detection based on Unigram Combinations
    Tobias Vogel, Felix Naumann
    Proceedings of the International Workshop on Quality in Databases (QDB) in conjunction with VLDB, 2012
  • Scalable Similarity Search with Dynamic Similarity Measures
    Martin Köppelmann, Dustin Lange, Claudia Lehmann, Marika Marszalkowski, Felix Naumann, Peter Retzlaff, Sebastian Stange, Lea Voget
    Proceedings of the International Workshop on Ranking in Databases (DBRank) in conjunction with VLDB, 2012
  • Scalable Iterative Graph Duplicate Detection
    Melanie Herschel, Felix Naumann, Sascha Szott, Maik Taubert
    Transactions on Knowledge and Data Engineering (TKDE) 24:(11), 2012
  • Latent Topics in Graph-Structured Data
    Christoph Böhm, Gjergji Kasneci, Felix Naumann
    Proceedings of the Conference on Information and Knowledge Management (CIKM), 2012
  • Discovering Conditional Inclusion Dependencies
    Jana Bauckmann, Ziawasch Abedjan, Heiko Müller, Ulf Leser, Felix Naumann
    Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2012
  • Understanding Cryptic Schemata in Large Extract-Transform-Load Systems
    Alexander Albrecht, Felix Naumann
    Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 2012
  • Fine-grained German Sentiment Analysis on Social Media
    Saeedeh Momtazi
    Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2012
  • Fusion Cubes: Towards Self-Service Business Intelligence
    Alberto Abelló, Jérôme Darmont, Lorena Etcheverry, Matteo Golfarelli, Jose-Norberto Mazón, Felix Naumann, Torben Bach Pedersen, Stefano Rizzi, Juan Trujillo, Panos Vassiliadis, Gottfried Vossen
    International Journal of Data Warehousing and Mining (IJDWM) 9:(2), 2012
  • Holistic and Scalable Ontology Alignment for Linked Open Data
    Toni Gruetze, Christoph Böhm, Felix Naumann
    Proceedings of the Linked Data on the Web (LDOW) Workshop at the International World Wide Web Conference (WWW), 2012
  • Bayesian online clustering of eye movement data
    Enkelejda Tafaj, Gjergji Kasneci, Wolfgang Rosenstiel, Martin Bogdan
    Proceedings of the Symposium on Eye-Tracking Research and Applications, 2012
    [Paper]  [DOI:10.1145/2168556.2168617]
  • Adaptive Windows for Duplicate Detection
    Uwe Draisbach, Felix Naumann, Sascha Szott, Oliver Wonneberg
    Proceedings of the International Conference on Data Engineering (ICDE), 2012
  • GovWILD: Integrating Open Government Data for Transparency (demo)
    Christoph Böhm, Markus Freitag, Arvid Heise, Claudia Lehmann, Andrina Mascher, Felix Naumann, Mauricio Hernandez, Vuk Ercegovac, Peter Haase
    Proceedings of the International World Wide Web Conference (WWW), 2012
  • Reconciling Ontologies and the Web of Data
    Ziawasch Abedjan, Johannes Lorey, Felix Naumann
    Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2012
  • Covering or complete? : discovering conditional inclusion dependencies
    Jana Bauckmann, Ziawasch Abedjan, Ulf Leser, Heiko Müller, Felix Naumann
    Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 2012
  • LINDA: Distributed Web-of-Data-Scale Entity Matching
    Christoph Böhm, Gerard de Melo, Felix Naumann, Gerhard Weikum
    Proceedings of the International Conference on Information and Knowledge Management (CIKM), Maui, Hawaii, 2012
  • Partitionierung zur effizienten Duplikaterkennung in relationalen Daten
    Uwe Draisbach
    Ausgezeichnete Arbeiten zur Informationsqualität. Springer Vieweg, 2012
  • Scalable Peer-to-Peer-based RDF Management
    Christoph Böhm, Daniel Hefenbrock, Felix Naumann
    Proceedings of the Int. Conference on Semantic Systems, 2012
  • Reasoning about Knowledge from the Web - (Extended Abstract)
    Gjergji Kasneci
    ICWE Workshops, 2012
    [Paper]  [DOI:10.1007/978-3-642-35623-0_19]
  • Meteor/Sopremo: An Extensible Query Language and Operator Model
    Arvid Heise, Astrid Rheinländer, Marcus Leich, Ulf Leser, Felix Naumann
    Proceedings of the International Workshop on End-to-end Management of Big Data (BigData) in conjunction with VLDB, 2012
  • Adaptive Windows for Duplicate Detection
    Uwe Draisbach, Felix Naumann
    Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 2012




  • Advancing the Discovery of Unique Column Combinations
    Ziawasch Abedjan, Felix Naumann
    Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2011
  • RDF Ontology (Re-)Engineering through Large-scale Data Mining
    Johannes Lorey, Ziawasch Abedjan, Felix Naumann, Christoph Böhm
    Billion Triples Challenge (BTC) at the International Semantic Web Conference (ISWC), 2011
  • Black Swan: Augmenting Statistics with Event Data
    Johannes Lorey, Felix Naumann, Benedikt Forchhammer, Andrina Mascher, Peter Retzlaff, Armin ZamaniFarahani, Soeren Discher, Cindy Faehnrich, Stefan Lemme, Thorsten Papenbrock, Robert Christoph Peschel, Stephan Richter, Thomas Stening, Sven Viehmeier
    Proceedings of the Conference on Information and Knowledge Management (CIKM), 2011
  • Instance-based one-to-some Assignment of Similarity Measures to Attributes
    Tobias Vogel, Felix Naumann
    Proceedings of the International Conference on Cooperative Information Systems (CoopIS), 2011
  • Projektseminar "Similarity Search Algorithms"
    Dustin Lange, Tobias Vogel, Uwe Draisbach, Felix Naumann
    Datenbank-Spektrum 11:(1), 2011
  • SPRINT: ranking search results by paths
    Christoph Böhm, Eyk Kny, Benjamin Emde, Ziawasch Abedjan, Felix Naumann
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2011
  • Advancing the Discovery of Unique Column Combinations
    Ziawasch Abedjan, Felix Naumann
    Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 2011
  • Frequency-aware Similarity Measures
    Dustin Lange, Felix Naumann
    Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2011
  • Context and Target Configurations for Mining RDF Data
    Ziawasch Abedjan, Felix Naumann
    International Workshop on Search & Mining Entity-Relationship Data (SMER), 2011
  • A Generalization of Blocking and Windowing Algorithms for Duplicate Detection
    Uwe Draisbach, Felix Naumann
    Proceedings of the International Conference on Data and Knowledge Engineering (ICDKE), 2011
  • Efficient Similarity Search: Arbitrary Similarity Measures, Arbitrary Composition
    Dustin Lange, Felix Naumann
    Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2011
  • Kurz erklärt: Datenfusion
    Jens Bleiholder, Felix Naumann
    Datenbank-Spektrum 11:(1), 2011
  • Eliminating NULLs with Subsumption and Complementation
    Jens Bleiholder, Melanie Herschel, Felix Naumann
    Data Engineering Bulletin 34:(3), 2011
  • Improving Service Discovery through Enriched Service Descriptions
    Mohammed AbuJarour, Felix Naumann
    Datenbanksysteme für Business, Technologie und Web (BTW), 2011
  • Creating voiD Descriptions for Web-scale Data
    Christoph Böhm, Johannes Lorey, Felix Naumann
    Journal of Web Semantics: Science, Services and Agents on the World Wide Web 9:(3), 2011
    [Paper]  [DOI:10.1016/j.websem.2011.06.001]




  • Profiling linked open data with ProLOD
    Christoph Böhm, Felix Naumann, Ziawasch Abedjan, Dandy Fenz, Toni Gruetze, Daniel Hefenbrock, Matthias Pohl, David Sonnabend
    Proceedings of the International Conference on Data Engineering (ICDE), 2010
  • Efficient and Exact Computation of Inclusion Dependencies for Data Integration
    Jana Bauckmann, Ulf Leser, Felix Naumann
    Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 2010
  • Extracting structured information from Wikipedia articles to populate infoboxes
    Dustin Lange, Christoph Böhm, Felix Naumann
    Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2010
  • An Introduction to Duplicate Detection
    Felix Naumann, Melanie Herschel
    Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2010
  • Dynamic tags for dynamic data web services
    Mohammed AbuJarour, Felix Naumann
    Proceedings of the Workshop on Emerging Web Services Technology (WEWST), 2010
  • Proceedings of the 13th International Conference on Extending Database Technology (EDBT), Lausanne, Switzerland
    Xin Luna Dong, Felix Naumann
    ACM International Conference Proceeding Series. ACM, 2010
  • DuDe: The Duplicate Detection Toolkit
    Uwe Draisbach, Felix Naumann
    Proceedings of the International Workshop on Quality in Databases (QDB), 2010
  • Towards Granular Data Placement Strategies for Cloud Platforms
    Johannes Lorey, Felix Naumann
    Proceedings of the International Conference on Granular Computing (GrC), 2010
  • Towards a diamond SOA operational model
    Mohammed AbuJarour, Felix Naumann
    IEEE International Conference on Service-Oriented Computing and Applications (SOCA), 2010
  • 13th International Workshop on the Web and Databases: WebDB 2010 (workshop report)
    Xin Luna Dong, Felix Naumann
    SIGMOD Record 39:(3), 2010
  • Proceedings of the 13th International Workshop on the Web and Databases (WebDB), Indianapolis, IN
    Xin Luna Dong, Felix Naumann
    ACM, 2010
  • Extracting structured information from Wikipedia articles to populate infoboxes
    Dustin Lange, Christoph Böhm, Felix Naumann
    Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 2010
  • Collecting, Annotating, and Classifying Public Web Services
    Mohammed AbuJarour, Felix Naumann, Mircea Craculeac
    On the Move to Meaningful Internet Systems: OTM - Confederated International Conferences: CoopIS, IS, DOA and ODBASE, 2010
  • Linking open government data: what journalists wish they had known
    Christoph Böhm, Felix Naumann, Markus Freitag, Stefan George, Norman Höfler, Martin Köppelmann, Claudia Lehmann, Andrina Mascher, Tobias Schmidt
    Proceedings the International Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 2010
  • Creating voiD Descriptions for Web-Scale Data
    Christoph Böhm, Johannes Lorey, Dandy Fenz, Eyk Kny, Matthias Pohl, Felix Naumann
    Billion Triples Challenge (BTC) at the International Semantic Web Conference (ISWC), 2010
  • Complement union for data integration
    Jens Bleiholder, Sascha Szott, Melanie Herschel, Felix Naumann
    Proceedings of the International Conference on Data Engineering (ICDE), 2010
  • Graph-based concept identification and disambiguation for enterprise search
    Falk Brauer, Michael Huber, Gregor Hackenbroich, Ulf Leser, Felix Naumann, Wojciech M. Barczynski
    Proceedings of the International Conference on World Wide Web (WWW), 2010
  • Self-Adaptive Data Quality Web Services
    Tobias Vogel
    Grundlagen von Datenbanken, 2010
  • Subsumption and complementation as data fusion operators
    Jens Bleiholder, Sascha Szott, Melanie Herschel, Frank Kaufer, Felix Naumann
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2010




  • Graph-Based Ontology Construction from Heterogeneous Evidences
    Christoph Böhm, Philip Groth, Ulf Leser
    Proceedings of the International Semantic Web Conference (ISWC), 2009
  • Data fusion - Resolving Data Conflicts for Integration (tutorial)
    Xin Luna Dong, Felix Naumann
    PVLDB 2:(2), 2009
  • A Machine Learning Approach to Foreign Key Discovery
    Alexandra Rostin, Oliver Albrecht, Jana Bauckmann, Felix Naumann, Ulf Leser
    Proceedings of the International Workshop on the Web and Databases (WebDB), 2009
  • A Comparison and Generalization of Blocking and Windowing Algorithms for Duplicate Detection
    Uwe Draisbach, Felix Naumann
    Proceedings of the International Workshop on Quality in Databases (QDB), 2009
  • POSR: A Comprehensive System for Aggregating and Using Web Services (demo)
    Mohammed AbuJarour, Mircea Craculeac, Falko Menge, Tobias Vogel, Jan-Felix Schwarz
    Proceedings of the IEEE Services Cup at IEEE International Conference on Web Services (ICWS), 2009
  • Encapsulating Multi-stepped Web Forms as Web Services
    Tobias Vogel, Frank Kaufer, Felix Naumann
    Proceedings of the International Conference on Service-Oriented Computing (ICSOC), 2009
  • METL: Managing and Integrating ETL Processes
    Alexander Albrecht, Felix Naumann
    Proceedings of the VLDB PhD Workshop, 2009
  • Guest Editorial for the Special Issue on Data Quality in Databases
    Felix Naumann, Louiqa Raschid
    Journal of Data and Information Quality (JDIQ) 1:(2), 2009




  • Data fusion
    Jens Bleiholder, Felix Naumann
    ACM Computing Surveys 41:(1), 2008
  • Industry-scale duplicate detection
    Melanie Weis, Felix Naumann, Ulrich Jehle, Jens Lufter, Holger Schuster
    PVLDB 1:(2), 2008
  • Scaling up duplicate detection in graph data
    Melanie Herschel, Felix Naumann
    Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2008
  • Managing ETL Processes
    Alexander Albrecht, Felix Naumann
    Proceedings of the International Workshop on New Trends in Information Integration, (NTII), Auckland, New Zealand, 2008
  • A research agenda for query processing in large-scale peer data management systems
    Katja Hose, Armin Roth, Andre Zeitz, Kai-Uwe Sattler, Felix Naumann
    Information Systems (IS) 33:(7-8), 2008
  • Automated data augmentation services using text mining, data cleansing and web crawling techniques
    Matthias Jacob, Alexander Kuscher, Christoph Thiele, Max Plauth
    IEEE Congress on Services, 2008




  • Efficiently Detecting Inclusion Dependencies
    Jana Bauckmann, Ulf Leser, Felix Naumann, Veronique Tietz
    Proceedings of the International Conference on Data Engineering (ICDE), 2007
  • Schema- und Metadatenmanagement in Peer Data Management Systemen
    Felix Naumann
    Datenbanksysteme in Business, Technologie und Web (BTW), Workshop Proceedings, 2007
  • A Classification of Schema Mappings and Analysis of Mapping Tools
    Frank Legler, Felix Naumann
    Proceedings of Datenbanksysteme in Business, Technologie und Web (BTW), 2007
  • FuSem - Exploring Different Semantics of Data Fusion (demo)
    Jens Bleiholder, Karsten Draba, Felix Naumann
    Proceedings of the International Conference on Very Large Data Bases (VLDB), 2007
  • System P: Completeness-driven Query Answering in Peer Data Management Systems (demo)
    Armin Roth, Felix Naumann
    Datenbanksysteme in Business, Technologie und Web (BTW), 2007
  • Datenqualität
    Felix Naumann
    Informatik-Spektrum 30:(1), 2007
  • Emergent Data Quality Annotation And Visualization
    Paul Führing, Felix Naumann
    Proceedings of the International Conference on Information Quality (ICIQ), 2007
  • Rule-Based Measurement Of Data Quality In Nominal Data
    Jochen Hipp, Markus Müller, Johannes Hohendorff, Felix Naumann
    Proceedings of the International Conference on Information Quality (ICIQ), 2007
  • Answering Top K Queries Efficiently with Overlap of Answers in Sources or Source Paths
    Louiqa Raschid, Maria Esther Vidal, Yao Wu, Felix Naumann, Jens Bleiholder
    Proceedings of the International Workshop on Information Integration on the Web (IIWeb), 2007
  • Peer-Daten-Management-Systeme - PDMS
    Felix Naumann, Armin Roth
    Datenbank-Spektrum (2007)
  • Proceedings of the 5th International Workshop on Quality in Databases (QDB)
    Ganti Venkatesh, Felix Naumann
    , 2007
  • Networked PIM using PDMS
    Alexander Albrecht, Felix Naumann
    Proceedings of the International Workshop Networking Meets Databases (NetDB), 2007




  • Conflict Handling Strategies in an Integrated Information System
    Jens Bleiholder, Felix Naumann
    Proceedings of the International Workshop on Information Integration on the Web (IIWeb), 2006
  • Query Planning in the Presence of Overlapping Sources
    Jens Bleiholder, Samir Khuller, Felix Naumann, Louiqa Raschid, Yao Wu
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2006
  • XML Duplicate Detection Using Sorted Neighborhoods
    Sven Puhlmann, Melanie Weis, Felix Naumann
    Proceedings of the International Conference on Extending Database Technology (EDBT), 2006
  • Assessing the Completeness of Sensor Data
    Jit Biswas, Felix Naumann, Qiang Qiu
    Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA), 2006
  • Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies
    Felix Naumann, Alexander Bilke, Jens Bleiholder, Melanie Weis
    Data Engineering Bulletin 29:(2), 2006
  • XStruct: Efficient Schema Extraction from Multiple and Large XML Documents
    Jan Hegewald, Felix Naumann, Melanie Weis
    Proceedings of the International Conference on Data Engineering (ICDE), 2006
  • Efficiently Computing Inclusion Dependencies for Schema Discovery
    Jana Bauckmann, Ulf Leser, Felix Naumann
    Proceedings of the International Conference on Data Engineering (ICDE), 2006
  • Proceedings of the Data Integration in the Life Sciences Workshop (DILS)
    Ulf Leser, Felix Naumann, Barbara Eckmann
    Lecture Notes in Computer Science. Springer, 2006
  • System P: Query Answering in PDMS under Limited Resources
    Armin Roth, Felix Naumann, Tobias Hübner, Martin Schweigert
    Proceedings of the International Workshop on Information Integration on the Web (IIWeb), 2006
  • Informationsintegration: Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen
    Ulf Leser, Felix Naumann
    dpunkt, 2006
  • Detecting Duplicates in Complex XML Data
    Melanie Weis, Felix Naumann
    Proceedings of the International Conference on Data Engineering (ICDE), 2006
  • Information Quality: How Good are Off-the-Shelf DBMS?
    Felix Naumann, Mary Roth
    Information Quality Management: Theory and Applications. Idea Group Inc., 2006




  • (Almost) Hands-Off Information Integration for the Life Sciences
    Ulf Leser, Felix Naumann
    Proceedings of the International Conference on Innovative Database Research (CIDR), 2005
  • Self-Extending Peer Data Management
    Ralf Heese, Sven Herschel, Felix Naumann, Armin Roth
    Datenbanksysteme in Business, Technologie und Web (BTW), Karlsruhe, Germany, 2005
  • Enhancing the Semantics of Links and Paths in Life Science Sources
    Stephan Heymann, Felix Naumann, Peter Rieger, Louiqa Raschid
    ICDT Workshop on Database Issues in Biological Databases (DBiBD), 2005
  • Declarative Data Fusion - Syntax, Semantics, and Implementation
    Jens Bleiholder, Felix Naumann
    Proceedings of the International Conference on Advances in Databases and Information Systems (ADBIS), 2005
  • Proceedings of the 2005 International Conference on Information Quality (MIT IQ Conference), Sponsored by Lockheed Martin, MIT, Cambridge, MA, USA, November 10-12, 2006

    MIT, 2005
  • Ein Data-Quality-Wettbewerb
    Michael Mielke, Heiko Müller, Felix Naumann
    Datenbank-Spektrum (2005)
  • A Data Model and Query Language to Explore Enhanced Links and Paths in Life Science Sources
    George A. Mihaila, Felix Naumann, Louiqa Raschid, Maria-Esther Vidal
    Proceedings of the International Workshop on the Web & Databases (WebDB), 2005
  • DogmatiX Tracks down Duplicates in XML
    Melanie Weis, Felix Naumann
    Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2005
  • Benefit and Cost of Query Answering in PDMS
    Armin Roth, Felix Naumann
    Proceedings of the Databases, Information Systems, and Peer-to-Peer Computing Workshop (DBISP2P) Seoul, Korea, 2005
  • Fuzzy Duplicate Detection on XML Data
    Melanie Weis
    Proceedings of the VLDB PhD workshop, 2005
  • Schema Matching using Duplicates
    Alexander Bilke, Felix Naumann
    Proceedings of the International Conference on Data Engineering (ICDE), 2005
  • Automatic Data Fusion with HumMer (demo)
    Alexander Bilke, Jens Bleiholder, Christoph Böhm, Karsten Draba, Felix Naumann, Melanie Weis
    Proceedings of the International Conference on Very Large Data Bases (VLDB), 2005
  • A Duplicate Detection Benchmark for XML (and Relational) Data
    Melanie Weis, Felix Naumann, Franziska Brosy
    Proceedings of the SIGMOD International Workshop on Information Quality for Information Systems (IQIS), 2005
  • Beitragsband zum Studierenden-Programm bei der 11. Fachtagung "Datenbanken für Business, Technologie and Web", GI Fachbereich Datenbanken und Informationssysteme, Karlsruhe
    Hagen Höpfner, Gunter Saaske, Felix Naumann, Andreas Heuer
    Universität Magdeburg, Fakultät für Informatik, 2005
  • Clio: A Schema Mapping Tool for Information Integration
    Mauricio A. Hernández, Lucian Popa, Howard Ho, Felix Naumann
    Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Networks (ISPAN), 2005




  • Information Quality: How Good Are Off-The-Shelf DBMS?
    Felix Naumann, Mary Roth
    Proceedings of the International Conference on Information Quality (ICIQ), Cambridge, MA, 2004
  • Proceedings of the International Workshop on Information Quality in Information Systems (SIGMOD Workshop)
    Felix Naumann, Monica Scannapieco
    ACM, 2004
  • Labeling and Enhancing Life Sciences Links
    Stephan Heymann, Felix Naumann, Louiqa Raschid, Peter Rieger
    Proceedings of the International IEEE Computer Society Computational Systems Bioinformatics Conference (CSB), 2004
  • Eine Übung zur Vorlesung Informationsintegration
    Felix Naumann, Jens Bleiholder, Melanie Weis
    Datenbank-Spektrum (2004)
  • Informationsintegration
    Felix Naumann
    Öffentliche Vorlesung an der Humboldt-Universität zu Berlin, 2004
  • Qualitäts- und Semantik-gesteuerte Anfragebearbeitung für Peer-basierte Datenmanagementsysteme (PDMS)
    Armin Roth, Felix Naumann
    INFORMATIK - Band 1, Beiträge der 34. Jahrestagung der Gesellschaft für Informatik e.V. (GI), Ulm, Germany, 2004
  • Querying Web-Accessible Life Science Sources: Which paths to choose?
    Jens Bleiholder, Felix Naumann, Louiqa Raschid, Maria Esther Vidal
    Proceedings of the International Workshop on Information Integration on the Web (IIWeb), 2004
  • Links and Paths through Life Sciences Data Sources
    Zoé Lacroix, Hyma Murthy, Felix Naumann, Louiqa Raschid
    Humboldt-Universität zu Berlin, Institut für Informatik, 2004
  • BioFast: Challenges in Exploring Linked Life Science Sources
    Jens Bleiholder, Zoé Lacroix, Hyma Murthy, Felix Naumann, Louiqa Raschid, Maria-Esther Vidal
    SIGMOD Record 33:(2), 2004
  • Links and Paths through Life Sciences Data Sources
    Zoé Lacroix, Hyma Murthy, Felix Naumann, Louiqa Raschid
    Proceedings of the International WorkshopData Integration in the Life Sciences (DILS), 2004
  • FUSE BY: Syntax und Semantik zur Informationsfusion in SQL
    Jens Bleiholder, Felix Naumann
    INFORMATIK, Band 1, Beiträge der 34. Jahrestagung der Gesellschaft für Informatik e.V. (GI), 2004
  • Detecting Duplicate Objects in XML Documents
    Melanie Weis, Felix Naumann
    International Workshop on Information Quality in Information Systems (IQIS), 2004
  • Completeness of integrated information sources
    Felix Naumann, Johann Christoph Freytag, Ulf Leser
    Information Systems (IS) 29:(7), 2004




  • Exploring Life Sciences Data Sources
    Zoé Lacroix, Felix Naumann, Louiqa Raschid, Maria-Esther Vidal
    Proceedings of Workshop on Information Integration on the Web (IIWeb), 2003
  • Information Quality Assessment and Measurement
    Felix Naumann, Cinzia Capiello, Vipul Kashyap, Gunter Saake
    Data Quality on the Web, 2003
  • Super-Fast XML Wrapper Generation in DB2: A Demonstration
    Vanja Josifovski, Sabine Massmann, Felix Naumann
    Proceedings of the International Conference on Data Engineering (ICDE), 2003
  • Object Identification Quality
    Mattis Neiling, Steffen Jurk, Hans-J. Lenz, Felix Naumann
    Proceedings of the International Workshop on Data Quality in Cooperative Information Systsems (DQCIS), 2003
  • Semantic Overlay Clusters within Super-Peer Networks
    Alexander Löser, Felix Naumann, Wolf Siberski, Wolfgang Nejdl, Uwe Thaden
    First International Workshop on Databases, Information Systems, and Peer-to-Peer Computing (DBISP2P), 2003
  • Data Quality in Genome Databases
    Heiko Müller, Felix Naumann
    Proceedings of the International Conference on Information Quality (ICIQ), 2003
  • Qualitätsgesteuerte Anfragebearbeitung für Integrierte Informationssysteme
    Felix Naumann
    it - Information Technology 45:(1), 2003
  • Completeness of Information Sources
    Felix Naumann, Johann-Christoph Freytag, Ulf Leser
    Proceedings of the International Workshop on Data Quality in Cooperative Information Systsems (DQCIS), 2003




  • Schema Management
    Periklis Andritsos, Ronald Fagin, Ariel Fuxman, Laura M. Haas, Mauricio A. Hernández, C. T. Howard Ho, Anastasios Kementsietsidis, Renée J. Miller, Felix Naumann, Lucian Popa, Yannis Velegrakis, Charlotte Vilarem, Ling-Ling Yan
    Data Engineering Bulletin 25:(3), 2002
  • Declarative Data Merging with Conflict Resolution
    Felix Naumann, Matthias Häussler
    Proceedings of the International Conference on Information Quality (ICIQ), 2002
  • Quality-Driven Query Answering for Integrated Information Systems
    Felix Naumann
    Lecture Notes in Computer Science. Springer, 2002
  • Mapping XML and Relational Schemas with Clio (demo)
    Mauricio A. Hernández, Lucian Popa, Yannis Velegrakis, Renée J. Miller, Felix Naumann, Ching-Tien Ho
    Proceedings of the International Conference on Data Engineering (ICDE), 2002
  • Schema Mapping and Data Integration with Clio (demo)
    Barbara Eckman, Mauricio Hernandez, Howard Ho, Felix Naumann, Lucian Popa
    Intelligent Systems for Molecular Biology (ISMB), 2002
  • Attribute Classification Using Feature Analysis
    Felix Naumann, Ching-Tien Ho, Xuqing Tian, Laura M. Haas, Nimrod Megiddo
    Proceedings of the International Conference on Data Engineering (ICDE), 2002
  • Attribute Classification Using Feature Analysis
    Felix Naumann, Ching-Tien Ho, Xuqing Tian, Laura Haas, Nimrod Megiddo
    IBM Almaden Research Center, 2002




  • From Databases to Information Systems - Information Quality Makes the Difference
    Felix Naumann
    Proceedings of the International Conference on Information Quality (ICIQ), 2001




  • Approximate Tree Embedding for Querying XML Data
    Torsten Schlieder, Felix Naumann
    Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval, 2000
  • Assessment Methods for Information Quality Criteria
    Felix Naumann, Claudia Rolker
    Proceedings of the International Conference on Information Quality (ICIQ), 2000
  • Completeness of Information Sources
    Felix Naumann, Johann-Christoph Freytag
    Humboldt-Universität zu Berlin, Institut für Informatik, 2000
  • Assessment Methods for Information Quality Criteria
    Felix Naumann, Claudia Rolker
    Humboldt-Universität zu Berlin, Institut für Informatik, 2000
  • Maximizing Coverage of Mediated Web Queries
    Ramana Yerneni, Felix Naumann, Hector Garcia-Molina
    Stanford University, CA, 2000
  • Quality-driven Query Planning
    Felix Naumann
    Proceedings of the EDBT PhD Workshop, 2000
  • Cooperative Query Answering with Density Scores
    Felix Naumann, Ulf Leser
    Proceedings of the International Conference on Management of Data (COMAD), 2000
  • Query Planning with Information Quality Bounds
    Ulf Leser, Felix Naumann
    Proceedings of the International Conference on Flexible Query Answering Systems (FQAS), 2000




  • Quality-driven Integration of Heterogeneous Information Systems
    Felix Naumann, Ulf Leser, Johann Christoph Freytag
    Proceedings of International Conference on Very Large Data Bases (VLDB), 1999
  • Quality-driven Integration of Heterogeneous Information Systems
    Felix Naumann, Ulf Leser, Johann-Christoph Freytag
    Humboldt-Universität zu Berlin, Institut für Informatik, 1999
  • Density Scores for Cooperative Query Answering
    Felix Naumann, Ulf Leser
    Workshop on Föderierte Datenbanken (FDBMS), 1999
  • Do Metadata Models meet IQ Requirements?
    Felix Naumann, Claudia Rolker
    Proceedings of the International Conference on Information Quality (ICIQ), 1999




  • Quality Driven Source Selection Using Data Envelopment Analysis
    Felix Naumann, Johann Christoph Freytag, Myra Spiliopoulou
    Proceedings of the International Conference on Information Quality (ICIQ), 1998
  • Data Fusion and Data Quality
    Felix Naumann
    Proceedings of the New Techniques & Technologies for Statistics Seminar (NTTS), 1998