Please also see our (selected) presentations and dissertations.

Publications

2025

PRISMA: A Privacy-Preserving Schema Matcher using Functional Dependencies
Jan-Eric Hellenberg, Fabian Mahling, Lukas Laskowski, Felix Naumann, Matteo Paganelli, Fabian Panse
Proceedings of the 28th International Conference on Extending Database Technology (EDBT), 2025 (to appear)

2024

Shact: Disentangling and Clustering Latent Syntactic Structures from Transformer Encoders
Alejandro Sierra-Múnera, Ralf Krestel
Proceedings of the 29th International Conference on Natural Language & Information Systems (NLDB), 2024
[Paper] [GitHub] [DOI:10.1007/978-3-031-70239-6_25]
An Introduction to Machine Learning from Time Series
Anthony Bagnall, Matthew Middlehurst, Germain Forestier, Ali Ismail-Fawaz, Antoine Guillaume, David Guijo-Rubio, Arik Ermshaus, Patrick Schäfer, Thorsten Papenbrock, Phillip Wenig, Sebastian Schmidl
Proceedings of the European Conference on Machine Learning and Data Mining (ECML PKDD), 2024 (to appear)
AutoTSAD: Unsupervised Holistic Anomaly Detection for Time Series Data
Sebastian Schmidl, Naumann Felix, Papenbrock Thorsten
PVLDB 17:(11), 2024
[Paper] [vldb] [Project Page] [DOI:10.14778/3681954.3681978]
Anomaly Detectors for Multivariate Time Series: The Proof of the Pudding is in the Eating
Phillip Wenig, Sebastian Schmidl, Thorsten Papenbrock
Proceedings of the International Conference on Data Engineering Workshops (ICDEW), 2024
[Paper] [DOI:10.1109/ICDEW61823.2024.00018]
The Effects of Data Quality on Named Entity Recognition
Divya Bhadauria, Alejandro Sierra-Múnera, Ralf Krestel
Proceedings of the Ninth Workshop on Noisy and User-generated Text (W-NUT 2024), 2024
[Paper] [GitHub]
Determining the Largest Overlap between Tables
Luca Zecchini, Tobias Bleifuß, Giovanni Simonini, Sonia Bergamaschi, Felix Naumann
Proceedings of the ACM on Management of Data (PACMMOD) (2024)
[DOI:10.1145/3639303]
Discovering Functional Dependencies through Hitting Set Enumeration
Tobias Bleifuß, Thorsten Papenbrock, Thomas Bläsius, Martin Schirneck, Felix Naumann
Proceedings of the ACM on Management of Data (PACMMOD) (2024)
[DOI:10.1145/3639298]
TASHEEH: Repairing Row-Structure in Raw CSV Files
Mazhar Hameed, Gerardo Vitagliano, Fabian Panse, Felix Naumann
Proceedings of the International Conference on Extending Database Technology (EDBT), 2024
[Paper] [DOI:10.48786/edbt.2024.37]
Efficient Discovery of Temporal Inclusion Dependencies in Wikipedia Tables
Leon Bornemann, Tobias Bleifuß, Dmitri V. Kalashnikov, Fatemeh Nargesian, Felix Naumann, Divesh Srivastava
Proceedings of the International Conference on Extending Database Technology (EDBT), 2024
[Paper] [DOI:10.48786/edbt.2024.35]
Discovering Denial Constraints in Dynamic Datasets
Eduardo Pena, Fabio Porto, Felix Naumann
Proceedings of the International Conference on Data Engineering (ICDE), 2024
[Paper] [IEEE]

2023

MORPHER: Structural Transformation of ill-formed Rows
Mazhar Hameed, Gerardo Vitagliano, Felix Naumann
Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2023
[DOI:10.1145/3583780.3614747]
Efficient Ultrafine Typing of Named Entities
Alejandro Sierra-Múnera, Jan Westphal, Ralf Krestel
Proceedings of the Joint Conference on Digital Libraries (JCDL), 2023
[Paper] [DOI:10.1109/JCDL57899.2023.00038]
Pollock: A Data Loading Benchmark
Gerardo Vitagliano, Mazhar Hameed, Lan Jiang, Lucas Reisener, Eugene Wu, Felix Naumann
PVLDB 16:(8), 2023
[vldb]
BCNF* - From Normalized- to Star-Schemas and Back Again (demo)
Marie Fischer, Paul Roessler, Paul Sieben, Janina Adamcic, Christoph Kirchherr, Tobias Sträubig, Youri Kaminsky, Felix Naumann
Proceedings of Companion of the 2023 International Conference on Management of Data (SIGMOD-Companion), 2023
[Paper] [Project Page] [DOI:10.1145/3555041.3589712]
Detecting Stale Data in Wikipedia Infoboxes
Malte Barth, Tibor Bleidt, Martin Büßemeyer, Fabian Heseding, Niklas Köhnecke, Tobias Bleifuß, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava
Proceedings of the International Conference on Extending Database Technology (EDBT), 2023
[Paper] [Project Page]
DPQL: The Data Profiling Query Language
Marcian Seeger, Sebastian Schmidl, Alexander Vielhauer, Thorsten Papenbrock
Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2023
[Paper] [DOI:10.18420/BTW2023-19]
HYPEX: Hyperparameter Optimization in Time Series Anomaly Detection
Sebastian Schmidl, Phillip Wenig, Thorsten Papenbrock
Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2023
[Paper] [Project Page] [DOI:10.18420/BTW2023-22]
ExtracTable: Extracting Tables from Raw Data Files
Leonardo Hübscher, Lan Jiang, Felix Naumann
Proceedings of the Conference on Database Systems for Business, Technology, and Web (BTW), 2023
[Paper] [Project Page] [DOI:10.18420/BTW2023-20]
Discovering Similarity Inclusion Dependencies
Youri Kaminsky, Eduardo Pena, Felix Naumann
Proceedings of the ACM on Management of Data (PACMMOD) (2023)
[Paper] [Project Page] [DOI:10.1145/3588929]
Matching Roles from Temporal Data - Why Joe Biden is not only President, but also Commander-in-Chief
Leon Bornemann, Tobias Bleifuß, Dmitri V. Kalashnikov, Fatemeh Nargesian, Felix Naumann, Divesh Srivastava
Proceedings of the ACM on Management of Data (PACMMOD) (2023)
[DOI:10.1145/3588919]
Fast Algorithms for Denial Constraint Discovery
Eduardo Pena, Fabio Porto, Felix Naumann
PVLDB 16:(4), 2023
[PVLDB]

2022

The Effects of Data Quality on Machine Learning Performance
Lukas Budach, Moritz Feuerpfeil, Nina Ihde, Andrea Nathansen, Nele Noack, Hendrik Patzlaff, Felix Naumann, Hazar Harmouch
arXiv (2022)
[arXiv]
Discovering Fine-Grained Semantics in Knowledge Graph Relations
Nitisha Jain, Ralf Krestel
Proceedings of the Thirty-First ACM International Conference on Information and Knowledge Management (CIKM), 2022
Structural embedding of data files with MaGRiTTE
Gerardo Vitagliano, Mazhar Hameed, Felix Naumann
Table Representation Learning Workshop at NeurIPS (TRL@NIPS), 2022
[paper] [project]
Art Creation with Multi-Conditional StyleGANs
Konstantin Dobler, Florian Hübscher, Jan Westphal, Alejandro Sierra-Múnera, Gerard de Melo, Ralf Krestel
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI), 2022
[IJCAI] [Extended arXiv Version] [DOI:10.24963/ijcai.2022/684]
Generation of Training Data for Named Entity Recognition of Artworks
Nitisha Jain, Alejandro Sierra-Múnera, Jan Ehmueller, Ralf Krestel
Semantic Web Journal (Special Issue Cultural Heritage 2021) (2022)
[Preprint]
Mondrian: Spreadsheet Layout Detection
Gerardo Vitagliano, Lucas Reisener, Lan Jiang, Mazhar Hameed, Felix Naumann
Proceedings of the International Conference on Management of Data (SIGMOD) (demo), 2022
[Paper] [ACM] [DOI:10.1145/3514221.3520152]
TimeEval: A Benchmarking Toolkit for Time Series Anomaly Detection Algorithms
Phillip Wenig, Sebastian Schmidl, Thorsten Papenbrock
PVLDB 12:(15), 2022
[Paper] [Project Page] [DOI:10.14778/3554821.3554873]
Frost: A Platform for Benchmarking and Exploring Data Matching Results (industry paper)
Martin Graf, Lukas Laskowski, Florian Papsdorf, Florian Sold, Roland Gremmelspacher, Felix Naumann, Fabian Panse
PVLDB 15:(12), 2022
[Paper] [Project Page]
Anomaly Detection in Time Series: A Comprehensive Evaluation
Sebastian Schmidl, Phillip Wenig, Thorsten Papenbrock
PVLDB 9:(15), 2022
[Paper] [Poster] [Project Page] [DOI:10.14778/3538598.3538602]
Data Errors: Symptoms, Causes and Origins
Ihab Ilyas, Felix Naumann
Data Engineering Bulletin 45:(1), 2022
[pdf]
Relation Canonicalization in Open Knowledge Graphs: A Quantitative Analysis
Maria Lomaeva, Nitisha Jain
Proceedings of the the Extended Semantic Web Conference, Posters and Demos (ESWC), 2022
Generating Domain-Specific Knowledge Graphs: Challenges with Open Information Extraction
Nitisha Jain, Alejandro Sierra-Múnera, Philipp Schmidt, Julius Streit, Simon Thormeyer, Maria Lomaeva, Ralf Krestel
Proceedings of the International Workshop on Knowledge Graph Generation from Text at ESWC, 2022
[Paper]
AI Compliance - Challenges of Bridging Data Science and Law
Philipp Hacker, Felix Naumann, Tobias Friedrich, Stefan Grundmann, Anja Lehmann, Herbert Zech
Journal of Data and Information Quality (JDIQ) (2022)
[DOI (open access)]
SURAGH: Syntactic Pattern Matching to Identify Ill-Formed Records
Mazhar Hameed, Gerardo Vitagliano, Lan Jiang, Felix Naumann
Proceedings of the International Conference on Extending Database Technology (EDBT), 2022
[DOI]
Mining Change Rules
Daniel Lindner, Franziska Schumann, Nicolas Alder, Tobias Bleifuß, Leon Bornemann, Felix Naumann
Proceedings of the International Conference on Extending Database Technology (EDBT), 2022
[DOI]
DataGossip: A Data Exchange Extension for Distributed Machine Learning Algorithms
Phillip Wenig, Thorsten Papenbrock
Proceedings of the International Conference on Extending Database Technology (EDBT), 2022
[Paper] [GitHub] [DOI:10.48786/edbt.2022.24]
Aggregation Detection in CSV Files
Lan Jiang, Gerardo Vitagliano, Mazhar Hameed, Felix Naumann
Proceedings of the International Conference on Extending Database Technology (EDBT), 2022
[DOI]
Detecting Layout Templates in Complex Multiregion Files
Gerardo Vitagliano, Lan Jiang, Felix Naumann
PVLDB 15:(3), 2022
[Paper] [ACM] [DOI:10.14778/3494124.3494145]
Entity Resolution On-Demand
Giovanni Simonini, Luca Zecchini, Sonia Bergamaschi, Felix Naumann
PVLDB 15:(7), 2022
[Paper]
Fast Detection of Denial Constraint Violations
Eduardo H. M. Pena, Eduardo C. de Almeida, Felix Naumann
PVLDB 15:(4), 2022
[VLDB] [DOI:10.14778/3503585.3503595]
Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization
Jan Kossmann, Felix Naumann, Daniel Lindner, Papenbrock Thorsten
Proceedings of the International Conference on Innovative Database Research (CIDR), 2022
[pdf]
Efficient Distributed Discovery of Bidirectional Order Dependencies
Sebastian Schmidl, Thorsten Papenbrock
The VLDB Journal (2022)
[Paper] [Poster] [Project Page] [DOI:10.1007/s00778-021-00683-4]
Data dependencies for query optimization: a survey
Jan Kossmann, Thorsten Papenbrock, Felix Naumann
The VLDB Journal (2022)
[Paper] [pdf] [doi]

2021

How Inclusive are We? An Analysis of Gender Diversity in Database Venues
Angela Bonifati, Michael J. Mior, Felix Naumann, Noack Nele Sina
SIGMOD Record 50:(4), 2021
[Paper] [ACM]
VLDB 2021: Designing a Hybrid Conference
Philippe Bonnet, Xin Luna Dong, Felix Naumann, Tözün Pinar
SIGMOD Record 50:(4), 2021
[Paper] [ACM]
Did You Enjoy the Last Supper? An Experimental Study on Cross-Domain NER Models for the Art Domain
Alejandro Sierra-Múnera, Ralf Krestel
Proceedings of the Workshop on Natural Language Processing for Digital Humanities (NLP4DH@ICON), 2021
[Paper] [GitHub]
Novel Views on Novels: Embedding Multiple Facets of Long Texts
Lasse Kohlmeyer, Tim Repke, Ralf Krestel
Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2021
[Paper] [GitHub Code] [GitHub Thesis] [DOI:10.1145/3486622.3494006]
Interactive Curation of Semantic Representations in Digital Libraries
Tim Repke, Ralf Krestel
Proceedings of the International Conference on Asia-Pacific Digital Libraries (ICADL), 2021
[Paper]
The Secret Life of Wikipedia Tables
Tobias Bleifuß, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava
Proceedings of the Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores (SEA-Data@VLDB), 2021
[Paper] [CEUR-WS] [Project]
Improving Knowledge Graph Embeddings with Ontological Reasoning
Nitisha Jain, Trung-Kien Tran, Mohamed H. Gad-Elrab, Daria Stepanova
Proceedings of the International Semantic Web Conference (ISWC), 2021
[Paper]
PatentMatch: A Dataset for Matching Patent Claims & Prior Art
Julian Risch, Nicolas Alder, Christoph Hewel, Ralf Krestel
Proceedings of the Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech@SIGIR), 2021
[Paper] [Project Page]
Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in One Unified Format
Julian Risch, Philipp Schmidt, Ralf Krestel
Proceedings of the Workshop on Online Abuse and Harms (WOAH@ACL), 2021
[Paper] [GitHub]
Extraction and Representation of Financial Entities from Text
Tim Repke, Ralf Krestel
Data Science for Economics and Finance. Springer, 2021
[Chapter] [Springer] [DOI:10.1007/978-3-030-66891-4_11]
CrashNet: an encoderdecoder architecture to predict crash test outcomes
Mohamed Karim Belaid, Maximilian Rabus, Ralf Krestel
Data Mining and Knowledge Discovery (2021)
[Springer] [DOI:10.1007/s10618-021-00761-9]
Modeling the Evolution of Word Senses with Force-Directed Layouts of Co-occurrence Networks
Robert Schwanhold, Tim Repke, Ralf Krestel
Proceedings of the International Workshop on Computational Approaches to Historical Language Change (LChange@ACL), 2021
[Paper] [Project] [DOI:10.18653/v1/2021.lchange-1.8]
Evaluation of Duplicate Detection Algorithms: From Quality Measures to Test Data Generation (tutorial)
Fabian Panse, Felix Naumann
Proceedings of the International Conference on Data Engineering (ICDE), 2021
[Paper]
Distributed detection of sequential anomalies in univariate time series
Johannes Schneider, Phillip Wenig, Thorsten Papenbrock
The VLDB Journal (2021)
[Paper] [Poster] [Project Page] [DOI:10.1007/s00778-021-00657-6]
Multifaceted Domain-Specific Document Embeddings
Julian Risch, Philipp Hager, Ralf Krestel
Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)(NAACL), 2021
[Paper] [Project Page]
Optimized Theta-Join Processing
Julian Weise, Sebastian Schmidl, Thorsten Papenbrock
Proceedings of the Conference on Database Systems for Business, Technology, and Web (BTW), 2021
[Paper] [Project Page] [DOI:10.18420/btw2021-03]
Do Embeddings Actually Capture Knowledge Graph Semantics?
Nitisha Jain, Jan-Christoph Kalo, Wolf-Tilo Balke, Ralf Krestel
Proceedings of the Extended Semantic Web Conference (ESWC), 2021
[Paper] [URL] [DOI:10.1007/978-3-030-77385-4_9]
Structured Object Matching across Web Page Revisions
Tobias Bleifuß, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava
Proceedings of the International Conference on Data Engineering (ICDE), 2021
[Paper] [IEEE] [Project] [DOI:10.1109/ICDE51399.2021.00115]
ComEx: Comment Exploration on Online News Platforms
Julian Risch, Tim Repke, Lasse Kohlmeyer, Ralf Krestel
Joint Proceedings of the ACM IUI Workshops co-located with the ACM Conference on Intelligent User Interfaces (IUI), 2021
[Paper] [GitHub] [Project] [CEUR-WS]
Relational Header Discovery using Similarity Search in a Table Corpus
Hazar Harmouch, Thorsten Papenbrock, Felix Naumann
Proceedings of the International Conference on Data Engineering (ICDE) (2021)
[DOI:10.1109/ICDE51399.2021.00045]
Structure Detection in Verbose CSV Files
Lan Jiang, Gerardo Vitagliano, Felix Naumann
Proceedings of the International Conference on Extending Database Technology (EDBT), 2021
[Paper] [GitHub] [DOI:10.5441/002/edbt.2021.18]
Discovering Relaxed Functional Dependencies based on Multi-attribute Dominance
Loredana Caruccio, Vincenzo Deufemia, Felix Naumann, Giuseppe Polese
Transactions on Knowledge and Data Engineering (TKDE) 33:(9), 2021
[IEEE] [DOI:10.1109/TKDE.2020.2967722]
Few-Shot Knowledge Validation using Rules
Michael Loster, Davide Mottin, Paolo Papotti, Felix Naumann, Jan Ehmueller, Benjamin Feldmann
Proceedings of The Web Conference (WWW), 2021
[DOI:10.1145/3442381.3450040]
PatentMatch: A Dataset for Matching Patent Claims with Prior Art
Julian Risch, Nicolas Alder, Christoph Hewel, Ralf Krestel
Proceedings of the Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech@SIGIR), 2021
[Paper] [Project Page] [CEUR-WS]
Robust Visualisation of Dynamic Text Collections: Measuring and Comparing Dimensionality Reduction Algorithms
Tim Repke, Ralf Krestel
Proceedings of the Conference on Human Information Interaction and Retrieval (CHIIR), 2021
[Paper] [DOI:10.1145/3406522.3446034]
Ein Data Engineering Kurs für 10.000 Teilnehmer
Nicolas Alder, Tobias Bleifuß, Leon Bornemann, Felix Naumann, Tim Repke
Datenbank-Spektrum 20:(1), 2021
[Article] [Springer] [openHPI] [DOI:10.1007/s13222-020-00354-8]
Knowledge Transfer for Entity Resolution with Siamese Neural Networks
Michael Loster, Ioannis Koumarelas, Felix Naumann
Journal of Data and Information Quality (JDIQ) 13:(1), 2021
[DOI:10.1145/3410157]

2020

Semantic Analysis of Cultural Heritage Data: Aligning Paintings and Descriptions in Art-Historic Collections
Nitisha Jain, Christian Bartz, Tobias Bredow, Emanuel Metzenthin, Jona Otholt, Ralf Krestel
Proceedings of the International Workshop on Fine Art Pattern Extraction and Recognition (FAPER@ICPR), 2020
[Paper] [Springer] [DOI:10.1007/978-3-030-68796-0_37]
HyCoNN: Hybrid Cooperative Neural Networks for Personalized News Discussion Recommendation
Julian Risch, Victor Künstler, Ralf Krestel
Proceedings of the International Joint Conferences on Web Intelligence and Intelligent Agent Technologies (WI-IAT), 2020
[Paper] [GitHub] [DOI:10.1109/WIIAT50758.2020.00011]
Learning Fine-Grained Semantics for Multi-Relational Data
Nitisha Jain, Ralf Krestel
Proceedings of the International Semantic Web Conference, Posters and Demos (ISWC), 2020
[Paper] [Poster]
Data Preparation: A Survey of Commercial Tools
Mazhar Hameed, Felix Naumann
SIGMOD Record 49:(3), 2020
[Paper] [ACM] [DOI:10.1145/3444831.3444835]
Efficient Detection of Data Dependency Violations
Eduardo H. M. Pena, Edson R. L. Filho, Eduardo C. de Almeida, Felix Naumann
Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2020
[Paper] [DOI:10.1145/3340531.3412062]
Hitting Set Enumeration with Partial Information for Unique Column Combination Discovery
Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, Martin Schirneck
PVLDB 13:(11), 2020
[Paper] [DOI:10.14778/3407790.3407824]
A Dataset of Journalists' Interactions with Their Readership: When Should Article Authors Reply to Reader Comments?
Julian Risch, Ralf Krestel
Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2020
[Paper] [GitHub] [DOI:10.1145/3340531.3412764]
Dynamic Channel and Layer Gating in Convolutional Neural Networks
Ali Ehteshami Bejnordi, Ralf Krestel
Proceedings of the German Conference on Artificial Intelligence (KI), 2020
[Paper] [DOI:10.1007/978-3-030-58285-2_3]
Sense Tree: Discovery of New Word Senses with Graph-based Scoring
Jan Ehmüller, Lasse Kohlmeyer, Holly McKee, Daniel Paeschke, Tim Repke, Ralf Krestel, Felix Naumann
Lernen, Wissen, Daten, Analysen (LWDA), 2020
[Paper] [CEUR-WS] [Project]
Multimodal Knowledge Graphs for Semantic Analysis of Cultural Heritage Data
Nitisha Jain
Invited Talk at the Workshop on Knowledge Bases and Multiple Modalities (KBMM@AKBC), 2020
[Paper]
Efficient Discovery of Matching Dependencies
Philipp Schirmer, Thorsten Papenbrock, Ioannis Koumarelas, Felix Naumann
Transactions on Database Systems (TODS) 45:(3), 2020
[Paper] [DOI:10.1145/3392778]
Explaining Offensive Language Detection
Julian Risch, Robin Ruff, Ralf Krestel
Journal for Language Technology and Computational Linguistics (JLCL) 34:(1), 2020
[Paper] [GitHub] [Publisher]
Discovering Biased News Articles Leveraging Multiple Human Annotations
Konstantina Lazaridou, Alexander Löser, Maria Mestre, Felix Naumann
Proceedings of the Conference on Language Resources and Evaluation (LREC), 2020
[Paper] [Paper]
Offensive Language Detection Explained
Julian Risch, Robin Ruff, Ralf Krestel
Proceedings of the Workshop on Trolling, Aggression and Cyberbullying (TRAC@LREC), 2020
[Paper] [GitHub] [ACL]
Hierarchical Document Classification as a Sequence Generation Task
Julian Risch, Samuele Garda, Ralf Krestel
Proceedings of the Joint Conference on Digital Libraries (JCDL), 2020
[Paper] [GitHub] [DOI:10.1145/3383583.3398538]
RHEEMix in the Data Jungle: A Cost-based Optimizer for Cross-Platform Systems
Sebastian Kruse, Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Sanjay Chawla, Felix Naumann, Bertty Contreras-Rojas
The VLDB Journal 29:(6), 2020
[URL] [DOI:10.1007/s00778-020-00612-x]
Bagging BERT Models for Robust Aggression Identification
Julian Risch, Ralf Krestel
Proceedings of the Workshop on Trolling, Aggression and Cyberbullying (TRAC@LREC), 2020
[Paper] [GitHub]
Domain-Specific Knowledge Graph Construction for Semantic Analysis
Nitisha Jain
Proceedings of the Extended Semantic Web Conference (ESWC), 2020
[Paper] [URL] [DOI:10.1007/978-3-030-62327-2_40]
Automatic Matching of Paintings and Descriptions in Art-Historic Archives using Multimodal Analysis
Nitisha Jain, Christian Bartz, Ralf Krestel
Proceedings of the International Workshop on Artificial Intelligence for Historical Image Enrichment and Access (AI4HI@LREC), 2020
[Paper] [URL]
Top Comment or Flop Comment? Predicting and Explaining User Engagement in Online News Discussions
Julian Risch, Ralf Krestel
Proceedings of the International Conference on Web and Social Media (ICWSM), 2020
[Paper] [GitHub]
Visualising Large Document Collections by Jointly Modeling Text and Network Structure
Tim Repke, Ralf Krestel
Proceedings of the Joint Conference on Digital Libraries (JCDL), 2020
[Paper] [Project] [DOI:10.1145/3383583.3398524]
Exploration Interface for Jointly Visualised Text and Graph Data
Tim Repke, Ralf Krestel
Proceedings of the International Conference on Intelligent User Interfaces Companion (IUI), 2020
[Paper] [Project] [DOI:10.1145/3379336.3381470]
Natural Key Discovery in Wikipedia Tables
Leon Bornemann, Tobias Bleifuß, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava
Proceedings of The Web Conference (WWW), 2020
[Paper] [DOI:10.1145/3366423.3380039]
Data Preparation for Duplicate Detection
Ioannis Koumarelas, Lan Jiang, Felix Naumann
Journal of Data and Information Quality (JDIQ) 12:(3), 2020
[DOI:10.1145/3377878]
Explainable AI under Contract and Tort Law: Legal Incentives and Technical Challenges
Philipp Hacker, Ralf Krestel, Stefan Grundmann, Felix Naumann
Artificial Intelligence and Law 28:(4), 2020
[Paper] [DOI:10.1007/s10506-020-09260-6]
MDedup: Duplicate Detection with Matching Dependencies
Ioannis Koumarelas, Thorsten Papenbrock, Felix Naumann
PVLDB 13:(5), 2020
[Paper] [DOI:10.14778/3377369.3377379]
Holistic Primary Key and Foreign Key Detection
Lan Jiang, Felix Naumann
Journal of Intelligent Information Systems 54:(3), 2020
[Paper] [DOI:10.1007/s10844-019-00562-z]
Toxic Comment Detection in Online Discussions
Julian Risch, Ralf Krestel
Deep Learning-Based Approaches for Sentiment Analysis. Springer, 2020
[Paper] [DOI:10.1007/978-981-15-1216-2]

2019

An Actor Database System for Akka
Sebastian Schmidl, Frederic Schneider, Thorsten Papenbrock
Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW) - Workshopband, 2019
[Paper] [DOI:10.18420/btw2019-ws-23]
Coverage of Information Extraction from Sentences and Paragraphs
Simon Razniewski, Nitisha Jain, Paramita Mirza, Gerhard Weikum
Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019
[Paper] [ACL Web] [DOI:10.18653/v1/D19-1583]
Discovery of Approximate (and Exact) Denial Constraints
Eduardo H. M. Pena, Eduardo C. de Almeida, Felix Naumann
PVLDB 13:(3), 2019
[Paper] [DOI:10.14778/3368289.3368293]
hpiDEDIS at GermEval 2019: Offensive Language Identification using a German BERT model
Julian Risch, Anke Stoll, Marc Ziegele, Ralf Krestel
Proceedings of the Conference on Natural Language Processing (KONVENS), 2019
[Paper] [GitHub]
A Scoring-based Approach for Data Preparator Suggestion
Lan Jiang, Gerardo Vitagliano, Felix Naumann
Lernen, Wissen, Daten, Analysen (LWDA), 2019
[Paper]
Inclusion Dependency Discovery: An Experimental Evaluation of Thirteen Algorithms
Falco Dürsch, Axel Stebner, Fabian Windheuser, Maxi Fischer, Tim Friedrich, Nils Strelow, Tobias Bleifuß, Hazar Harmouch, Lan Jiang, Thorsten Papenbrock, Felix Naumann
Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2019
[Paper] [Code] [DOI:10.1145/3357384.3357916]
Transforming Pairwise Duplicates to Entity Clusters for High Quality Duplicate Detection
Uwe Draisbach, Peter Christen, Felix Naumann
Journal of Data and Information Quality (JDIQ) 12:(1), 2019
[Paper] [DOI:10.1145/3352591]
Who is Mona L.? Identifying Mentions of Artworks in Historical Archives
Nitisha Jain, Ralf Krestel
International Conference on Theory and Practice of Digital Libraries (TPDL), 2019
[Paper] [Springer] [DOI:10.1007/978-3-030-30760-8_10]
Mining Business Relationships from Stocks and News
Thomas Kellermeier, Tim Repke, Ralf Krestel
Proceedings of the Workshop on Mining Data for Financial Applications (MIDAS@ECML-PKDD), 2019
[Paper] [DOI:10.1007/978-3-030-37720-5_6]
DynFD: Functional Dependency Discovery in Dynamic Datasets
Philipp Schirmer, Thorsten Papenbrock, Sebastian Kruse, Felix Naumann, Dennis Hempfing, Torben Mayer, Daniel Neuschäfer-Rube
Proceedings of the International Conference on Extending Database Technology (EDBT), 2019
[Paper] [DOI:10.5441/002/edbt.2019.23]
Measuring and Facilitating Data Repeatability in Web Science
Julian Risch, Ralf Krestel
Datenbank-Spektrum 19:(2), 2019
[Paper] [GitHub] [DOI:10.1007/s13222-019-00316-9]
Domain-specific word embeddings for patent classification
Julian Risch, Ralf Krestel
Data Technologies and Applications 53:(1), 2019
[Paper] [Project Page] [DOI:10.1108/DTA-01-2019-0002]
The relational database management systems genealogy
Felix Naumann
Making Databases Work. ACM / Morgan & Claypool, 2019
[Paper] [DOI:10.1145/3226595.3226611]
Optimizing Cross-Platform Data Movement
Sebastian Kruse, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, Sanjay Chawla, Felix Naumann, Bertty Contreras-Rojas
Proceedings of the International Conference on Data Engineering (ICDE), 2019
[Paper] [DOI:10.1109/ICDE.2019.00162]
DBChEx: Interactive Exploration of Data and Schema Change
Tobias Bleifuß, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava
Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2019
[Paper] [CIDRDB]

2018

CurEx: A System for Extracting, Curating, and Exploring Domain-Specific Knowledge Graphs from Text
Michael Loster, Felix Naumann, Jan Ehmueller, Benjamin Feldmann
Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2018
[Paper] [DOI:10.1145/3269206.3269229]
Dissecting Company Names using Sequence Labeling
Michael Loster, Manuel Hegner, Felix Naumann, Ulf Leser
Lernen, Wissen, Daten, Analysen (LWDA), 2018
[Paper] [Paper]
Towards Progressive Search-driven Entity Resolution
Alberto Pietrangelo, Giovanni Simonini, Sonia Bergamaschi, Felix Naumann, Ioannis Koumarelas
Italian Symposium on Advanced Database Systems (SEBD), 2018
[Paper] [Paper]
Experience: Enhancing Address Matching with Geocoding and Similarity Measure Selection
Ioannis Koumarelas, Axel Kroschk, Clifford Mosley, Felix Naumann
Journal of Data and Information Quality (JDIQ) 10:(2), 2018
[Paper] [DOI:10.1145/3232852]
The Challenges of Creating, Maintaining and Exploring Graphs of Financial Entities
Michael Loster, Tim Repke, Ralf Krestel, Felix Naumann, Jan Ehmueller, Benjamin Feldmann, Oliver Maspfuhl
Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling (DSMM), 2018
[Paper] [DOI:10.1145/3220547.3220553]
Data Profiling
Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock
Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2018
[M&C] [DOI:10.2200/S00878ED1V01Y201810DTM052]
Exploring Change - A New Dimension of Data Analytics
Tobias Bleifuß, Leon Bornemann, Theodore Johnson, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava
PVLDB 12:(2), 2018
[Paper] [PVLDB] [DOI:10.14778/3282495.3282496]
Book Recommendation Beyond the Usual Suspects: Embedding Book Plots Together with Place and Time Information
Julian Risch, Samuele Garda, Ralf Krestel
Proceedings of the International Conference On Asia-Pacific Digital Libraries (ICADL), 2018
[Paper] [GitHub] [DOI:10.1007/978-3-030-04257-8_24]
Fine-Grained Classification of Offensive Language
Julian Risch, Eva Krebs, Alexander Löser, Alexander Riese, Ralf Krestel
Proceedings of GermEval (co-located with KONVENS), 2018
[Paper]
Learning Patent Speak: Investigating Domain-Specific Word Embeddings
Julian Risch, Ralf Krestel
Proceedings of the International Conference on Digital Information Management (ICDIM), 2018
[Paper] [Project Page] [DOI:10.1109/ICDIM.2018.8846972]
Challenges for Toxic Comment Classification: An In-Depth Error Analysis
Betty van Aken, Julian Risch, Ralf Krestel, Alexander Löser
Proceedings of the Workshop on Abusive Language Online (ALW@EMNLP), 2018
[Paper] [DOI:10.18653/v1/w18-5105]
Beacon in the Dark: A System for Interactive Exploration of Large Email Corpora
Tim Repke, Ralf Krestel, Jakob Edding, Moritz Hartmann, Jonas Hering, Dennis Kipping, Hendrik Schmidt, Nico Scordialo, Alexander Zenner
Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2018
[Paper v1] [Paper v2] [Project] [DOI:10.1145/3269206.3269231]
RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! -
Divy Agrawal, Sanjay Chawla, Zoi Kaoudi, Sebastian Kruse, Jorge Arnulfo Quiané-Ruiz, Bertty Contreras-Rojas, Ahmed Elmagarmid, Yasser Idris, Ji Lucas, Essam Mansour, Mourad Ouzzani, Paolo Papotti, Nan Tang, Saravanan Thirumuruganathan, Anis Troudi
PVLDB 11:(11), 2018
[Paper] [DOI:10.14778/3236187.3236195]
Piggyback Profiling: Enhancing Query Results with Metadata
Claudia Exeler, Maria Graber, Tino Junge, Stefan Ramson, Cathleen Ramson, Fabian Tschirschnitz, Felix Naumann
Lernen, Wissen, Daten, Analysen (LWDA), 2018
[Paper]
Aggression Identification Using Deep Learning and Data Augmentation
Julian Risch, Ralf Krestel
Proceedings of the Workshop on Trolling, Aggression and Cyberbullying (TRAC@COLING), 2018
[Paper] [GitHub]
Delete or not Delete? Semi-Automatic Comment Moderation for the Newsroom
Julian Risch, Ralf Krestel
Proceedings of the Workshop on Trolling, Aggression and Cyberbullying (TRAC@COLING), 2018
[Paper]
Data Change Exploration using Time Series Clustering
Leon Bornemann, Tobias Bleifuß, Dmitri Kalashnikov, Felix Naumann, Divesh Srivastava
Datenbank-Spektrum 18:(2), 2018
[Paper] [DOI:10.1007/s13222-018-0285-x]
WELDA: Enhancing Topic Models by Incorporating Local Word Contexts
Stefan Bunk, Ralf Krestel
Proceedings of the Joint Conference on Digital Libraries (JCDL), 2018
[Paper] [DOI:10.1145/3197026.3197043]
Prediction for the Newsroom: Which Articles Will Get the Most Comments?
Carl Ambroselli, Julian Risch, Ralf Krestel, Andreas Loos
Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2018
[Paper] [GitHub] [DOI:10.18653/v1/n18-3024]
Where in the World Is Carmen Sandiego? Detecting Person Locations via Social Media Discussions
Konstantina Lazaridou, Toni Gruetze, Felix Naumann
Proceedings of the ACM Conference on Web Science (WebSci), 2018
[Paper] [URL] [DOI:10.1145/3201064.3201068]
Efficient Discovery of Approximate Dependencies
Sebastian Kruse, Felix Naumann
PVLDB 11:(7), 2018
[Paper] [Errata] [DOI:10.14778/3192965.3192968]
My Approach = Your Apparatus? Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections
Julian Risch, Ralf Krestel
Proceedings of the Joint Conference on Digital Libraries (JCDL), 2018
[Paper] [GitHub] [arXiv] [DOI:10.1145/3197026.3197038]
Discovery of Genuine Functional Dependencies from Relational Data with Missing Values
Laure Berti-Equille, Hazar Harmouch, Felix Naumann, Noel Novelli, Saravanan Thirumuruganathan
PVLDB, 2018
[Paper] [Paper] [DOI:10.14778/3204028.3204032]
Topic-aware Network Visualisation to Explore Large Email Corpora
Tim Repke, Ralf Krestel
International Workshop on Big Data Visual Exploration and Analytics (BigVis), 2018
[Paper] [Project]
Data Quality The Role of Empiricism
Shazia Sadiq, Tamraparni Dasu, Xin Luna Dong, Juliana Freire, Ihab F. Ilyas, Sebastian Link, Renée J. Miller, Felix Naumann, Xiaofang Zhou, Divesh Srivastava
SIGMOD Record 46:(4), 2018
[Paper] [DOI:10.1145/3186549.3186559]
Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks
Tim Repke, Ralf Krestel
Proceedings of the European Conference on Information Retrieval (ECIR), 2018
[Paper] [Project] [DOI:10.1007/978-3-319-76941-7_9]

2017

Metacrate: Organize and Analyze Millions of Data Profiles
Sebastian Kruse, David Hahn, Marius Walter, Felix Naumann
Proceedings of the ACM on Conference on Information and Knowledge Management (CIKM), 2017
[Paper] [DOI:10.1145/3132847.3133180]
Cardinality Estimation: An Experimental Survey
Hazar Harmouch, Felix Naumann
PVLDB, 2017
[Paper] [Paper] [DOI:10.1145/3164135.3164145]
Detecting Inclusion Dependencies on Very Many Tables
Fabian Tschirschnitz, Thorsten Papenbrock, Felix Naumann
Transactions on Database Systems (TODS) 42:(3), 2017
[Paper] [DOI:10.1145/3105959]
ssHMM: Extracting Intuitive Sequence-Structure Motifs from High-Throughput RNA-Binding Protein Data
David Heller, Ralf Krestel, Uwe Ohler, Martin Vingron, Annalisa Marsico
Nucleic Acid Research 45:(19), 2017
[DOI:10.1093/nar/gkx756]
Efficient Denial Constraint Discovery with Hydra
Tobias Bleifuß, Sebastian Kruse, Felix Naumann
PVLDB 11:(3), 2017
[Paper] [PVLDB] [DOI:10.14778/3157794.3157800]
Effect of a Website That Presents Patients' Experiences on Self-Efficacy and Patient Competence of Colorectal Cancer Patients: Web-Based Randomized Controlled Trial
M. Jürgen Giesler, Bettina Keller, Tim Repke, Rainer Leonhart, Joachim Weis, Rebecca Muckelbauer, Nina Rieckmann, Jacqueline Müller-Nordhorn, Gabriele Lucius-Hoene, Christine Holmberg
Journal of Medical Internet Research (JMIR) 19:(10), 2017
[JMIR] [DOI:10.2196/jmir.7639]
Identifying Media Bias by Analyzing Reported Speech
Konstantina Lazaridou, Ralf Krestel, Felix Naumann
Proceedings of the International Conference on Data Mining (ICDM), 2017
[IEEE] [DOI:10.1109/ICDM.2017.119]
Real or Fake? Large-Scale Validation of Identity Leaks
Fabian Maschler, Fabio Niephaus, Julian Risch
Jahrestagung der Gesellschaft für Informatik (INFORMATIK), 2017
[Paper] [DOI:10.18420/in2017_248]
Uncovering Business Relationships: Context-sensitive Relationship Extraction for Difficult Relationship Types
Zhe Zuo, Michael Loster, Ralf Krestel, Felix Naumann
Lernen, Wissen, Daten, Analysen (LWDA), 2017
[Paper]
How Do Search Engines Work? A Massive Open Online Course with 4000 Participants
Ralf Krestel, Julian Risch
Lernen, Wissen, Daten, Analysen (LWDA), 2017
[Paper]
Improving Company Recognition from Unstructured Text by using Dictionaries
Michael Loster, Zhe Zuo, Felix Naumann, Oliver Maspfuhl, Dirk Thomas
Proceedings of the International Conference on Extending Database Technology, 2017
[Paper] [DOI:10.5441/002/edbt.2017.82]
What Should I Cite? Cross-Collection Reference Recommendation of Patents and Papers
Julian Risch, Ralf Krestel
Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL), 2017
[Paper] [GitHub]
Enabling Change Exploration (Vision)
Tobias Bleifuß, Theodore Johnson, Dmitri V. Kalashnikov, Felix Naumann, Vladislav Shkapenyuk, Divesh Srivastava
Proceedings of the Fourth International Workshop on Exploratory Search in Databases and the Web (ExploreDB), 2017
[Paper] [DOI:10.1145/3077331.3077340]
Fast Approximate Discovery of Inclusion Dependencies
Sebastian Kruse, Thorsten Papenbrock, Christian Dullweber, Moritz Finke, Manuel Hegner, Martin Zabel, Christian Zöllner, Felix Naumann
Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2017
[Paper]
A Hybrid Approach for Efficient Unique Column Combination Discovery
Thorsten Papenbrock, Felix Naumann
Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2017
[Paper]
Data-driven Schema Normalization
Thorsten Papenbrock, Felix Naumann
Proceedings of the International Conference on Extending Database Technology (EDBT), 2017
[Paper] [DOI:10.5441/002/edbt.2017.31]
Das Fachgebiet Informationssysteme am Hasso-Plattner-Institut
Felix Naumann, Ralf Krestel
Datenbank-Spektrum 17:(1), 2017
[Paper] [URL]
What was Hillary Clinton doing in Katy, Texas?
Toni Gruetze, Ralf Krestel, Konstantina Lazaridou, Felix Naumann
Proceedings of the International Conference on World Wide Web (WWW), 2017
[Paper]
Comparing Features for Ranking Relationships Between Financial Entities Based on Text
Tim Repke, Michael Loster, Ralf Krestel
Proceedings of the International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets (DSMM), 2017
[Paper] [Poster] [Slides] [DOI:10.1145/3077240.3077252]
Data Profiling (tutorial)
Ziawasch Abedjan, Lukasz Golab, Felix Naumann
Proceedings of the International Conference on Management of Data (SIGMOD), 2017
[Paper]

2016

Biterm pseudo document topic model for short text
Lan Jiang, Hengyang Lu, Ming Xu, Chongjun Wang
Proceedings of the International Conference on Tools with Artificial Intelligence (ICTAI), 2016
[Paper] [IEEE] [DOI:10.1109/ICTAI.2016.0134]
Extraction Of Citation Data From Websites Based On Visual Cues
Tim Repke
, 2016
[Thesis]
Cluster-based Sorted Neighborhood for Efficient Duplicate Detection
Ahmad Samiei, Felix Naumann
International Conference on Data Mining Workshops (ICDMW), 2016
[URL]
Approximate Discovery of Functional Dependencies for Large Datasets
Tobias Bleifuß, Susanne Bülow, Johannes Frohnhofen, Julian Risch, Georg Wiese, Sebastian Kruse, Thorsten Papenbrock, Felix Naumann
Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2016
[Paper] [DOI:10.1145/2983323.2983781]
Rheem: Enabling Multi-Platform Task Execution (demo)
Divy Agrawal, Lamine Ba, Laure Berti-Equille, Sanjay Chawla, Ahmed Elmagarmid, Hossam Hammady, Yasser Idris, Zoi Kaoudi, Zuhair Khayyat, Sebastian Kruse, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Mohammed J. Zaki
Proceedings of the ACM Conference on Management of Data (SIGMOD), 2016
[Paper]
Combination of Rule-based and Textual Similarity Approaches to Match Financial Entities
Ahmad Samiei, Ioannis Koumarelas, Michael Loster, Felix Naumann
Data Science for Macro-Modeling with Financial and Economic Datasets (DSMM), 2016
[Paper] [URL]
Holistic Data Profiling: Simultaneous Discovery of Various Metadata
Jens Ehrlich, Mandy Roick, Lukas Schulze, Jakob Zwiener, Thorsten Papenbrock, Felix Naumann
Proceedings of the International Conference on Extending Database Technology (EDBT), 2016
[Paper] [Paper]
Classification of German Newspaper Comments
Christian Godde, Konstantina Lazaridou, Ralf Krestel
Lernen, Wissen, Daten, Analysen (LWDA), 2016
[Paper]
Identifying Political Bias in News Articles
Konstantina Lazaridou, Ralf Krestel
International Conference on Theory and Practice of Digital Libraries. IEEE Technical Committee on Digital Libraries, 2016
[Paper]
RDFind: Scalable Conditional Inclusion Dependency Discovery in RDF Datasets
Sebastian Kruse, Anja Jentzsch, Thorsten Papenbrock, Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Felix Naumann
Proceedings of the International Conference on Management of Data (SIGMOD), 2016
[Paper] [DOI:10.1145/2882903.2915206]
Data Anamnesis: Admitting Raw Data into an Organization
Sebastian Kruse, Thorsten Papenbrock, Hazar Harmouch, Felix Naumann
Data Engineering Bulletin 39:(2), 2016
[Paper]
A Hybrid Approach to Functional Dependency Discovery
Thorsten Papenbrock, Felix Naumann
Proceedings of the International Conference on Management of Data (SIGMOD), 2016
[Paper] [DOI:10.1145/2882903.2915203]
TextAI: Enhancing TextAE with Intelligent Annotation Support
Maximilian Grundke, Johannes Jasper, Mariya Perchyk, Jan Philipp Sachse, Ralf Krestel, Mariana Neves
Proceedings of the International Symposium on Semantic Mining in Biomedicine (SMBM), 2016
[Paper] [DOI:10.1007/978-3-319-41754-7_18]
Analyzing NIH Funding Patterns over Time with Statistical Text Analysis
Jihyun Park, Margaret Blume-Kohout, Ralf Krestel, Eric Nalisnick, Padhraic Smyth
Scholarly Big Data: AI Perspectives, Challenges, and Ideas (SBD) Workshop at AAAI, 2016
[Paper]
Proceedings of the Conference "Lernen, Wissen, Daten, Analysen", Potsdam, Germany, September 12-14, 2016
Ralf Krestel, Davide Mottin, Emmanuel Müller
CEUR Workshop Proceedings. CEUR-WS.org, 2016
Which Answer is Best? Predicting Accepted Answers in MOOC Forums
Maximilian Jenders, Ralf Krestel, Felix Naumann
Proceedings of the International Conference Companion on World Wide Web, 2016
[Paper]
Topic Shifts in StackOverflow: Ask it like Socrates
Toni Gruetze, Ralf Krestel, Felix Naumann
Lecture Notes in Computer Science, 2016
[Paper] [DOI:10.1007/978-3-319-41754-7_18]
The Information Systems Group at HPI
Felix Naumann, Ralf Krestel
SIGMOD Record (2016)
[Paper]
Using others experiences. Cancer patients expectations and navigation of a website providing narratives on prostate, breast and colorectal cancer
Jennifer Engler, Sandra Adami, Yvonne Adam, Bettina Keller, Tim Repke, Hella Fügemann, Gabriele Lucius-Hoene, Jacqueline Müller-Nordhorn, Christine Holmberg
Patient Education and Counseling 99:(8), 2016
[ScienceDirect] [DOI:10.1016/j.pec.2016.03.015]
CohEEL: Coherent and Efficient Named Entity Linking through Random Walks
Toni Gruetze, Gjergji Kasneci, Zhe Zuo, Felix Naumann
Web Semantics: Science, Services and Agents on the World Wide Web 37:(C), 2016
[Paper] [DOI:10.1016/j.websem.2016.03.001]
Efficient Order Dependency Discovery
Philipp Langer, Felix Naumann
The VLDB Journal 25:(2), 2016
[DOI:10.1007/s00778-015-0412-3]
Data Profiling (tutorial)
Lukasz Golab Ziawasch Abedjan, Felix Naumann
Proceedings of the International Conference on Data Engineering (ICDE), 2016
[Paper]

2015

Social Media Story Telling
Patrick Hennig, Philipp Berger, Christian Dullweber, Moritz Finke, Fabian Maschler, Julian Risch, Christoph Meinel
Proceedings of the International Conference on Social Computing and Networking (SocialCom), 2015
[Paper] [DOI:10.1109/SmartCity.2015.84]
Ergonomic Interaction for Touch Floors
Dominik Schmidt, Johannes Frohnhofen, Sven Knebel, Florian Meinel, Mariya Perchyk, Julian Risch, Jonathan Striebel, Julia Wachtel, Patrick Baudisch
Proceedings of the Conference on Human Factors in Computing Systems (CHI), 2015
[Paper] [DOI:10.1145/2702123.2702254]
Tweet-Recommender: Finding Relevant Tweets for News Articles
Ralf Krestel, Thomas Werkmeister, Timur Pratama Wiradarma, Gjergji Kasneci
Proceedings of the International World Wide Web Conference (WWW), 2015
[Paper]
Progressive Duplicate Detection
Thorsten Papenbrock, Arvid Heise, Felix Naumann
IEEE Transactions on Knowledge and Data Engineering (TKDE) 27:(5), 2015
[Paper] [DOI:10.1109/TKDE.2014.2359666]
Scaling Out the Discovery of Inclusion Dependencies
Sebastian Kruse, Thorsten Papenbrock, Felix Naumann
Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2015
[Paper]
Divide & Conquer-based Inclusion Dependency Discovery
Thorsten Papenbrock, Sebastian Kruse, Jorge-Arnulfo Quiane-Ruiz, Felix Naumann
PVLDB 8:(7), 2015
[Paper] [DOI:10.14778/2752939.2752946]
Data Profiling with Metanome
Thorsten Papenbrock, Tanja Bergmann, Moritz Finke, Jakob Zwiener, Felix Naumann
PVLDB 8:(12), 2015
[Paper] [DOI:10.14778/2824032.2824086]
Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms
Thorsten Papenbrock, Jens Ehrlich, Jannik Marten, Tommy Neubert, Jan-Peer Rudolph, Martin Schönberg, Jakob Zwiener, Felix Naumann
PVLDB 8:(10), 2015
[Paper] [DOI:10.14778/2794367.2794377]
Diversifying Customer Review Rankings
Ralf Krestel, Nima Dokoohaki
Neural Networks (2015)
[DOI:10.1016/j.neunet.2015.02.008]
Online Temporal Summarization of News Events
Tobias Schubotz, Ralf Krestel
Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015
[Paper]
Learning Temporal Tagging Behaviour
Toni Gruetze, Gary Yao, Ralf Krestel
Proceedings of the International Conference on World Wide Web Companion (WWW), 2015
[Paper] [DOI:10.1145/2740908.2741701]
How to Stay Up-to-date on Twitter with General Keywords
Mandy Roick, Maximilian Jenders, Ralf Krestel
Proceedings of the LWA Workshops: KDML, FGWM, IR, and FGDB, 2015
[Paper]
A Serendipity Model For News Recommendation
Maximilian Jenders, Thorben Lindhauer, Gjergji Kasneci, Ralf Krestel, Felix Naumann
KI: Advances in Artificial Intelligence - Annual German Conference on AI, 2015
[Paper]
Profiling relational data: a survey
Ziawasch Abedjan, Lukasz Golab, Felix Naumann
The VLDB Journal 24:(4), 2015
[Paper] [DOI:10.1007/s00778-015-0389-y]
Uniqueness, Density, and Keyness: Exploring Class Hierarchies
Anja Jentzsch, Hannes Mühleisen, Felix Naumann
In Proceedings of International Workshop on Consuming Linked Data (COLD), ISWC, 2015
[Paper]
Exploring Linked Data Graph Structures
Anja Jentzsch, Christian Dullweber, Pierpaolo Troiano, Felix Naumann
Proceedings of the International Semantic Web Conference, Posters and Demos (ISWC), 2015
[Paper]
SOFA: An Extensible Logical Optimizer for UDF-heavy Data Flows
Astrid Rheinländer, Arvid Heise, Fabian Hueske, Ulf Leser, Felix Naumann
Information Systems (2015)
Estimating Data Integration and Cleaning Effort
Sebastian Kruse, Paolo Papotti, Felix Naumann
Proceedings of the International Conference on Extending Database Technology (EDBT), 2015
[Paper]

2014

Multi-label emotion classification for tweets in weibo: Method and application
Jun Yang, Lan Jiang, Chongjun Wang, Junyuan Xie
Proceedings of the International Conference on Tools with Artificial Intelligence (ICTAI), 2014
[IEEE] [DOI:10.1109/ICTAI.2014.71]
Versatile optimization of UDF-heavy data flows with SOFA
Astrid Rheinländer, Martin Beckmann, Anja Kunkel, Arvid Heise, Thomas Stoltmann, Ulf Leser
Proceedings of the International Conference on Management of Data (SIGMOD), 2014
[Paper] [DOI:10.1145/2588555.2594517]
The Stratosphere Platform for Big Data Analytics
Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinländer, Matthias J. Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, Daniel Warneke
The VLDB Journal 23:(6), 2014
[Paper]
LODOP - Multi-Query Optimization for Linked Data Profiling Queries
Benedikt Forchhammer, Anja Jentzsch, Felix Naumann
Proceedings of the Extended Semantic Web Conference (ESWC), 2014
[Paper]
Modeling human newspaper readers: The Fuzzy Believer approach
Ralf Krestel, Sabine Bergler, René Witte
Natural Language Engineering 20:(2), 2014
[Paper] [DOI:10.1017/S1351324912000289]
Detecting Unique Column Combinations on Dynamic Data
Ziawasch Abedjan, Jorge-Arnulfo Quanie-Ruiz, Felix Naumann
Proceedings of the International Conference on Data Engineering (ICDE), 2014
[Paper]
Data Perspective in Process Choreographies: Modeling and Execution
Andreas Meyer, Luise Pufahl, Kimon Batoulis, Sebastian Kruse, Thorben Lindhauer, Thomas Stoff, Dirk Fahland, Mathias Weske
International Conference on Advanced Information Systems Engineering, 2014
Assigning Global Relevance Scores to DBpedia Facts
Philipp Langer, Patrick Schulze, Stefan George, Matthias Kohnen, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci
International Workshop on Data Engineering meets the Semantic Web (DESWeb), 2014
[Paper]
Bootstrapping Wikipedia to Answer Ambiguous Person Name Queries
Toni Gruetze, Gjergji Kasneci, Zhe Zuo, Felix Naumann
International Workshop on Information Integration on the Web (IIWeb), 2014
[Paper]
DFD: Efficient Discovery of Functional Dependencies
Ziawasch Abedjan, Patrick Schulze, Felix Naumann
In Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2014
[Paper]
Profiling and Mining RDF Data with ProLOD++
Ziawasch Abedjan, Toni Gruetze, Anja Jentzsch, Felix Naumann
Proceedings of the International Conference on Data Engineering (ICDE), 2014
[Paper]
Identifying and Determining SPARQL Endpoint Characteristics
Johannes Lorey
International Journal of Web Information Systems 10:(3), 2014
Semi-Supervised Consensus Clustering: Reducing Human Effort
Tobias Vogel, Felix Naumann
Proceedings of the International Workshop on Data Integration and Applications, 2014
[Paper]
DBpedia A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, Christian Bizer
Semantic Web Journal (2014)
BEL: Bagging for Entity Linking
Zhe Zuo, Gjergji Kasneci, Toni Gruetze, Felix Naumann
25th International Conference on Computational Linguistics (COLING), 2014
[Paper]
Estimating the Number and Sizes of Fuzzy-Duplicate Clusters
Arvid Heise, Gjergji Kasneci, Felix Naumann
Proceedings of the Conference on Information and Knowledge Management (CIKM), 2014
[Paper]
Amending RDF Entities with New Facts
Ziawasch Abedjan, Felix Naumann
Proceedings of the Extended Semantic Web Conference (ESWC), 2014
[Paper]
Reach for Gold: An Annealing Standard to Evaluate Duplicate Detection Results
Tobias Vogel, Arvid Heise, Uwe Draisbach, Dustin Lange, Felix Naumann
Journal of Data and Information Quality (JDIQ) 5:(1-2), 2014
[Paper]

2013

Storing and Provisioning Linked Data as a Service
Johannes Lorey
Proceedings of the Extended Semantic Web Conference (ESWC), 2013
[Paper]
Improving RDF Data through Association Rule Mining
Ziawasch Abedjan, Felix Naumann
Datenbank-Spektrum (Special Issue on RDF Data Management) 13:(2), 2013
[Paper]
Detecting SPARQL Query Templates for Data Prefetching
Johannes Lorey, Felix Naumann
Proceedings of the Extended Semantic Web Conference (ESWC), 2013
[Paper]
Caching and Prefetching Strategies for SPARQL Queries
Johannes Lorey, Felix Naumann
Proceedings of the Extended Semantic Web Conference (ESWC), 2013
[Paper]
Analyzing and Predicting Viral Tweets
Maximilian Jenders, Gjergji Kasneci, Felix Naumann
Proceedings of the International World Wide Web Conference (WWW), 2013
[Paper]
Applying Stratosphere for Big Data Analytics
Marcus Leich, Jochen Adamek, Moritz Schubotz, Arvid Heise, Astrid Rheinlander, Volker Markl
Database Systems for Business, Technology, and Web (BTW), 2013
[Paper]
Topic modeling for expert finding using latent dirichlet allocation
Saeedeh Momtazi, Felix Naumann
WIREs Data Mining and Knowledge Discovery 3:(5), 2013
[Paper]
Synonym Analysis for Predicate Expansion
Ziawasch Abedjan, Felix Naumann
Proceedings of the Extended Semantic Web Conference (ESWC), 2013
[Paper]
SPARQL Endpoint Metrics for Quality-Aware Linked Data Consumption
Johannes Lorey
Proceedings of the International Conference on Information Integration and Web-based Applications & Services (iiWAS), 2013
[Paper]
Cross-lingual Entity Matching and Infobox Alignment in Wikipedia
Daniel Rinser, Dustin Lange, Felix Naumann
Information Systems (IS) 38:(6), 2013
[Paper]
Ein Datenbankkurs mit 6000 Teilnehmern - Erfahrungen auf der openHPI MOOC Plattform
Felix Naumann, Maximilian Jenders, Thorsten Papenbrock
Informatik-Spektrum 37:(12), 2013
[Paper] [DOI:10.1007/s00287-013-0750-8]
Duplicate Detection on GPUs
Benedikt Forchhammer, Thorsten Papenbrock, Thomas Stening, Sven Viehmeier, Uwe Draisbach, Felix Naumann
Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2013
[Paper]
SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases
Simon Lacoste-Julien, Konstantina Palla, Alex Davies, Gjergji Kasneci, Thore Graepel, Zoubin Ghahramani
Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2013
Scalable Discovery of Unique Column Combinations
Arvid Heise, Jorge-Arnulfo Quiane-Ruiz, Ziawasch Abedjan, Anja Jentzsch, Felix Naumann
PVLDB, 2013
[Paper] [Slides] [doi]
Caching and Prefetching Strategies for SPARQL Queries
Johannes Lorey, Felix Naumann
Proceedings of the International Workshop on Usage Analysis and the Web of Data (USEWOD), 2013
[Paper]
Cost-Aware Query Planning for Similarity Search
Dustin Lange, Felix Naumann
Information Systems (IS) 38:(4), 2013
[Paper]
Bulk Sorted Access for Efficient Top-k Retrieval
Dustin Lange, Felix Naumann
Proceedings of the International Conference on Scientific and Statistical Database Management (SSDBM), 2013
[Paper]
Systematic ETL Management Experiences with High-Level Operators
Alexander Albrecht, Felix Naumann
Proceedings of the International Conference on Information Quality (ICIQ), 2013
[Paper]
SOFA: An Extensible Logical Optimizer for UDF-heavy Dataflows
Astrid Rheinländer, Arvid Heise, Fabian Hueske, Ulf Leser, Felix Naumann
, 2013
[]
On Choosing Thresholds for Duplicate Detection
Uwe Draisbach, Felix Naumann
Proceedings of the International Conference on Information Quality (ICIQ), 2013
[Paper]
Data Profiling Revisited
Felix Naumann
SIGMOD Record 32:(4), 2013
[Paper]

2012

Efficient Similarity Search in Very Large String Sets
Dandy Fenz, Dustin Lange, Astrid Rheinländer, Felix Naumann, Ulf Leser
Proceedings of the International Conference on Scientific and Statistical DatabaseManagement (SSDBM), 2012
[Paper]
Schema Decryption for Large Extract-Transform-Load Systems
Alexander Albrecht, Felix Naumann
Proceedings of the International Conference on Conceptual Modeling (ER), 2012
[Paper]
Integrating Open Government Data with Stratosphere for more Transparency
Arvid Heise, Felix Naumann
Web Semantics: Science, Services and Agents on the World Wide Web 14:(1), 2012
[Paper] [DOI:10.1016/j.websem.2012.02.002]
The Data Analytics Group at the Qatar Computing Research Institute
George Beskales, Gautam Das, Ahmed K. Elmagarmid, Ihab F. Ilyas, Felix Naumann, Mourad Ouzzani, Paolo Papotti, Jorge Quiane-Ruiz, Nan Tang
SIGMOD Record 41:(4), 2012
Automatic Blocking Key Selection for Duplicate Detection based on Unigram Combinations
Tobias Vogel, Felix Naumann
Proceedings of the International Workshop on Quality in Databases (QDB) in conjunction with VLDB, 2012
[Paper]
Scalable Similarity Search with Dynamic Similarity Measures
Martin Köppelmann, Dustin Lange, Claudia Lehmann, Marika Marszalkowski, Felix Naumann, Peter Retzlaff, Sebastian Stange, Lea Voget
Proceedings of the International Workshop on Ranking in Databases (DBRank) in conjunction with VLDB, 2012
[Paper]
Scalable Iterative Graph Duplicate Detection
Melanie Herschel, Felix Naumann, Sascha Szott, Maik Taubert
Transactions on Knowledge and Data Engineering (TKDE) 24:(11), 2012
Latent Topics in Graph-Structured Data
Christoph Böhm, Gjergji Kasneci, Felix Naumann
Proceedings of the Conference on Information and Knowledge Management (CIKM), 2012
[Paper]
Discovering Conditional Inclusion Dependencies
Jana Bauckmann, Ziawasch Abedjan, Heiko Müller, Ulf Leser, Felix Naumann
Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2012
Understanding Cryptic Schemata in Large Extract-Transform-Load Systems
Alexander Albrecht, Felix Naumann
Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 2012
Fine-grained German Sentiment Analysis on Social Media
Saeedeh Momtazi
Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2012
Fusion Cubes: Towards Self-Service Business Intelligence
Alberto Abelló, Jérôme Darmont, Lorena Etcheverry, Matteo Golfarelli, Jose-Norberto Mazón, Felix Naumann, Torben Bach Pedersen, Stefano Rizzi, Juan Trujillo, Panos Vassiliadis, Gottfried Vossen
International Journal of Data Warehousing and Mining (IJDWM) 9:(2), 2012
[DOI:10.4018/jdwm.2013040104]
Holistic and Scalable Ontology Alignment for Linked Open Data
Toni Gruetze, Christoph Böhm, Felix Naumann
Proceedings of the Linked Data on the Web (LDOW) Workshop at the International World Wide Web Conference (WWW), 2012
[Paper]
Bayesian online clustering of eye movement data
Enkelejda Tafaj, Gjergji Kasneci, Wolfgang Rosenstiel, Martin Bogdan
Proceedings of the Symposium on Eye-Tracking Research and Applications, 2012
[Paper] [DOI:10.1145/2168556.2168617]
Adaptive Windows for Duplicate Detection
Uwe Draisbach, Felix Naumann, Sascha Szott, Oliver Wonneberg
Proceedings of the International Conference on Data Engineering (ICDE), 2012
[Paper]
GovWILD: Integrating Open Government Data for Transparency (demo)
Christoph Böhm, Markus Freitag, Arvid Heise, Claudia Lehmann, Andrina Mascher, Felix Naumann, Mauricio Hernandez, Vuk Ercegovac, Peter Haase
Proceedings of the International World Wide Web Conference (WWW), 2012
Reconciling Ontologies and the Web of Data
Ziawasch Abedjan, Johannes Lorey, Felix Naumann
Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2012
Covering or complete? : discovering conditional inclusion dependencies
Jana Bauckmann, Ziawasch Abedjan, Ulf Leser, Heiko Müller, Felix Naumann
Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 2012
LINDA: Distributed Web-of-Data-Scale Entity Matching
Christoph Böhm, Gerard de Melo, Felix Naumann, Gerhard Weikum
Proceedings of the International Conference on Information and Knowledge Management (CIKM), Maui, Hawaii, 2012
Partitionierung zur effizienten Duplikaterkennung in relationalen Daten
Uwe Draisbach
Ausgezeichnete Arbeiten zur Informationsqualität. Springer Vieweg, 2012
Scalable Peer-to-Peer-based RDF Management
Christoph Böhm, Daniel Hefenbrock, Felix Naumann
Proceedings of the Int. Conference on Semantic Systems, 2012
[Paper]
Reasoning about Knowledge from the Web - (Extended Abstract)
Gjergji Kasneci
ICWE Workshops, 2012
[Paper] [DOI:10.1007/978-3-642-35623-0_19]
Meteor/Sopremo: An Extensible Query Language and Operator Model
Arvid Heise, Astrid Rheinländer, Marcus Leich, Ulf Leser, Felix Naumann
Proceedings of the International Workshop on End-to-end Management of Big Data (BigData) in conjunction with VLDB, 2012
[Paper]
Adaptive Windows for Duplicate Detection
Uwe Draisbach, Felix Naumann
Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 2012
[Paper]

2011

Advancing the Discovery of Unique Column Combinations
Ziawasch Abedjan, Felix Naumann
Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2011
[Paper]
RDF Ontology (Re-)Engineering through Large-scale Data Mining
Johannes Lorey, Ziawasch Abedjan, Felix Naumann, Christoph Böhm
Billion Triples Challenge (BTC) at the International Semantic Web Conference (ISWC), 2011
[Paper]
Black Swan: Augmenting Statistics with Event Data
Johannes Lorey, Felix Naumann, Benedikt Forchhammer, Andrina Mascher, Peter Retzlaff, Armin ZamaniFarahani, Soeren Discher, Cindy Faehnrich, Stefan Lemme, Thorsten Papenbrock, Robert Christoph Peschel, Stephan Richter, Thomas Stening, Sven Viehmeier
Proceedings of the Conference on Information and Knowledge Management (CIKM), 2011
[Paper]
Instance-based one-to-some Assignment of Similarity Measures to Attributes
Tobias Vogel, Felix Naumann
Proceedings of the International Conference on Cooperative Information Systems (CoopIS), 2011
[Paper]
Projektseminar "Similarity Search Algorithms"
Dustin Lange, Tobias Vogel, Uwe Draisbach, Felix Naumann
Datenbank-Spektrum 11:(1), 2011
[Paper]
SPRINT: ranking search results by paths
Christoph Böhm, Eyk Kny, Benjamin Emde, Ziawasch Abedjan, Felix Naumann
Proceedings of the International Conference on Extending Database Technology (EDBT), 2011
[URL]
Advancing the Discovery of Unique Column Combinations
Ziawasch Abedjan, Felix Naumann
Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 2011
Frequency-aware Similarity Measures
Dustin Lange, Felix Naumann
Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2011
[Paper]
Context and Target Configurations for Mining RDF Data
Ziawasch Abedjan, Felix Naumann
International Workshop on Search & Mining Entity-Relationship Data (SMER), 2011
A Generalization of Blocking and Windowing Algorithms for Duplicate Detection
Uwe Draisbach, Felix Naumann
Proceedings of the International Conference on Data and Knowledge Engineering (ICDKE), 2011
[Paper]
Efficient Similarity Search: Arbitrary Similarity Measures, Arbitrary Composition
Dustin Lange, Felix Naumann
Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2011
[Paper]
Kurz erklärt: Datenfusion
Jens Bleiholder, Felix Naumann
Datenbank-Spektrum 11:(1), 2011
Eliminating NULLs with Subsumption and Complementation
Jens Bleiholder, Melanie Herschel, Felix Naumann
Data Engineering Bulletin 34:(3), 2011
Improving Service Discovery through Enriched Service Descriptions
Mohammed AbuJarour, Felix Naumann
Datenbanksysteme für Business, Technologie und Web (BTW), 2011
Creating voiD Descriptions for Web-scale Data
Christoph Böhm, Johannes Lorey, Felix Naumann
Journal of Web Semantics: Science, Services and Agents on the World Wide Web 9:(3), 2011
[Paper] [DOI:10.1016/j.websem.2011.06.001]

2010

Profiling linked open data with ProLOD
Christoph Böhm, Felix Naumann, Ziawasch Abedjan, Dandy Fenz, Toni Gruetze, Daniel Hefenbrock, Matthias Pohl, David Sonnabend
Proceedings of the International Conference on Data Engineering (ICDE), 2010
[Paper]
Efficient and Exact Computation of Inclusion Dependencies for Data Integration
Jana Bauckmann, Ulf Leser, Felix Naumann
Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 2010
[Paper]
Extracting structured information from Wikipedia articles to populate infoboxes
Dustin Lange, Christoph Böhm, Felix Naumann
Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2010
[Paper]
An Introduction to Duplicate Detection
Felix Naumann, Melanie Herschel
Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2010
Dynamic tags for dynamic data web services
Mohammed AbuJarour, Felix Naumann
Proceedings of the Workshop on Emerging Web Services Technology (WEWST), 2010
Proceedings of the 13th International Conference on Extending Database Technology (EDBT), Lausanne, Switzerland
Xin Luna Dong, Felix Naumann
ACM International Conference Proceeding Series. ACM, 2010
DuDe: The Duplicate Detection Toolkit
Uwe Draisbach, Felix Naumann
Proceedings of the International Workshop on Quality in Databases (QDB), 2010
[Paper]
Towards Granular Data Placement Strategies for Cloud Platforms
Johannes Lorey, Felix Naumann
Proceedings of the International Conference on Granular Computing (GrC), 2010
[Paper]
Towards a diamond SOA operational model
Mohammed AbuJarour, Felix Naumann
IEEE International Conference on Service-Oriented Computing and Applications (SOCA), 2010
13th International Workshop on the Web and Databases: WebDB 2010 (workshop report)
Xin Luna Dong, Felix Naumann
SIGMOD Record 39:(3), 2010
Proceedings of the 13th International Workshop on the Web and Databases (WebDB), Indianapolis, IN
Xin Luna Dong, Felix Naumann
ACM, 2010
Extracting structured information from Wikipedia articles to populate infoboxes
Dustin Lange, Christoph Böhm, Felix Naumann
Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 2010
[Paper]
Collecting, Annotating, and Classifying Public Web Services
Mohammed AbuJarour, Felix Naumann, Mircea Craculeac
On the Move to Meaningful Internet Systems: OTM - Confederated International Conferences: CoopIS, IS, DOA and ODBASE, 2010
Linking open government data: what journalists wish they had known
Christoph Böhm, Felix Naumann, Markus Freitag, Stefan George, Norman Höfler, Martin Köppelmann, Claudia Lehmann, Andrina Mascher, Tobias Schmidt
Proceedings the International Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 2010
[URL]
Creating voiD Descriptions for Web-Scale Data
Christoph Böhm, Johannes Lorey, Dandy Fenz, Eyk Kny, Matthias Pohl, Felix Naumann
Billion Triples Challenge (BTC) at the International Semantic Web Conference (ISWC), 2010
[Paper]
Complement union for data integration
Jens Bleiholder, Sascha Szott, Melanie Herschel, Felix Naumann
Proceedings of the International Conference on Data Engineering (ICDE), 2010
[Paper]
Graph-based concept identification and disambiguation for enterprise search
Falk Brauer, Michael Huber, Gregor Hackenbroich, Ulf Leser, Felix Naumann, Wojciech M. Barczynski
Proceedings of the International Conference on World Wide Web (WWW), 2010
Self-Adaptive Data Quality Web Services
Tobias Vogel
Grundlagen von Datenbanken, 2010
[Paper]
Subsumption and complementation as data fusion operators
Jens Bleiholder, Sascha Szott, Melanie Herschel, Frank Kaufer, Felix Naumann
Proceedings of the International Conference on Extending Database Technology (EDBT), 2010

2009

Graph-Based Ontology Construction from Heterogeneous Evidences
Christoph Böhm, Philip Groth, Ulf Leser
Proceedings of the International Semantic Web Conference (ISWC), 2009
Data fusion - Resolving Data Conflicts for Integration (tutorial)
Xin Luna Dong, Felix Naumann
PVLDB 2:(2), 2009
A Machine Learning Approach to Foreign Key Discovery
Alexandra Rostin, Oliver Albrecht, Jana Bauckmann, Felix Naumann, Ulf Leser
Proceedings of the International Workshop on the Web and Databases (WebDB), 2009
[Paper]
A Comparison and Generalization of Blocking and Windowing Algorithms for Duplicate Detection
Uwe Draisbach, Felix Naumann
Proceedings of the International Workshop on Quality in Databases (QDB), 2009
[Paper]
POSR: A Comprehensive System for Aggregating and Using Web Services (demo)
Mohammed AbuJarour, Mircea Craculeac, Falko Menge, Tobias Vogel, Jan-Felix Schwarz
Proceedings of the IEEE Services Cup at IEEE International Conference on Web Services (ICWS), 2009
[Paper]
Encapsulating Multi-stepped Web Forms as Web Services
Tobias Vogel, Frank Kaufer, Felix Naumann
Proceedings of the International Conference on Service-Oriented Computing (ICSOC), 2009
[Paper]
METL: Managing and Integrating ETL Processes
Alexander Albrecht, Felix Naumann
Proceedings of the VLDB PhD Workshop, 2009
Guest Editorial for the Special Issue on Data Quality in Databases
Felix Naumann, Louiqa Raschid
Journal of Data and Information Quality (JDIQ) 1:(2), 2009

2008

Data fusion
Jens Bleiholder, Felix Naumann
ACM Computing Surveys 41:(1), 2008
Industry-scale duplicate detection
Melanie Weis, Felix Naumann, Ulrich Jehle, Jens Lufter, Holger Schuster
PVLDB 1:(2), 2008
[Paper]
Scaling up duplicate detection in graph data
Melanie Herschel, Felix Naumann
Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2008
[Paper]
Managing ETL Processes
Alexander Albrecht, Felix Naumann
Proceedings of the International Workshop on New Trends in Information Integration, (NTII), Auckland, New Zealand, 2008
A research agenda for query processing in large-scale peer data management systems
Katja Hose, Armin Roth, Andre Zeitz, Kai-Uwe Sattler, Felix Naumann
Information Systems (IS) 33:(7-8), 2008
Automated data augmentation services using text mining, data cleansing and web crawling techniques
Matthias Jacob, Alexander Kuscher, Christoph Thiele, Max Plauth
IEEE Congress on Services, 2008
[IEEE]

2007

Efficiently Detecting Inclusion Dependencies
Jana Bauckmann, Ulf Leser, Felix Naumann, Veronique Tietz
Proceedings of the International Conference on Data Engineering (ICDE), 2007
[Paper]
Schema- und Metadatenmanagement in Peer Data Management Systemen
Felix Naumann
Datenbanksysteme in Business, Technologie und Web (BTW), Workshop Proceedings, 2007
[Paper]
A Classification of Schema Mappings and Analysis of Mapping Tools
Frank Legler, Felix Naumann
Proceedings of Datenbanksysteme in Business, Technologie und Web (BTW), 2007
[Paper]
FuSem - Exploring Different Semantics of Data Fusion (demo)
Jens Bleiholder, Karsten Draba, Felix Naumann
Proceedings of the International Conference on Very Large Data Bases (VLDB), 2007
[Paper]
System P: Completeness-driven Query Answering in Peer Data Management Systems (demo)
Armin Roth, Felix Naumann
Datenbanksysteme in Business, Technologie und Web (BTW), 2007
[Paper]
Datenqualität
Felix Naumann
Informatik-Spektrum 30:(1), 2007
[Paper]
Emergent Data Quality Annotation And Visualization
Paul Führing, Felix Naumann
Proceedings of the International Conference on Information Quality (ICIQ), 2007
[Paper]
Rule-Based Measurement Of Data Quality In Nominal Data
Jochen Hipp, Markus Müller, Johannes Hohendorff, Felix Naumann
Proceedings of the International Conference on Information Quality (ICIQ), 2007
[Paper]
Answering Top K Queries Efficiently with Overlap of Answers in Sources or Source Paths
Louiqa Raschid, Maria Esther Vidal, Yao Wu, Felix Naumann, Jens Bleiholder
Proceedings of the International Workshop on Information Integration on the Web (IIWeb), 2007
[Paper]
Peer-Daten-Management-Systeme - PDMS
Felix Naumann, Armin Roth
Datenbank-Spektrum (2007)
[Paper]
Proceedings of the 5th International Workshop on Quality in Databases (QDB)
Ganti Venkatesh, Felix Naumann
, 2007
Networked PIM using PDMS
Alexander Albrecht, Felix Naumann
Proceedings of the International Workshop Networking Meets Databases (NetDB), 2007
[Paper]

2006

Conflict Handling Strategies in an Integrated Information System
Jens Bleiholder, Felix Naumann
Proceedings of the International Workshop on Information Integration on the Web (IIWeb), 2006
[Paper]
Query Planning in the Presence of Overlapping Sources
Jens Bleiholder, Samir Khuller, Felix Naumann, Louiqa Raschid, Yao Wu
Proceedings of the International Conference on Extending Database Technology (EDBT), 2006
[Paper]
XML Duplicate Detection Using Sorted Neighborhoods
Sven Puhlmann, Melanie Weis, Felix Naumann
Proceedings of the International Conference on Extending Database Technology (EDBT), 2006
[Paper]
Assessing the Completeness of Sensor Data
Jit Biswas, Felix Naumann, Qiang Qiu
Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA), 2006
[Paper]
Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies
Felix Naumann, Alexander Bilke, Jens Bleiholder, Melanie Weis
Data Engineering Bulletin 29:(2), 2006
[Paper]
XStruct: Efficient Schema Extraction from Multiple and Large XML Documents
Jan Hegewald, Felix Naumann, Melanie Weis
Proceedings of the International Conference on Data Engineering (ICDE), 2006
[Paper]
Efficiently Computing Inclusion Dependencies for Schema Discovery
Jana Bauckmann, Ulf Leser, Felix Naumann
Proceedings of the International Conference on Data Engineering (ICDE), 2006
[Paper]
Proceedings of the Data Integration in the Life Sciences Workshop (DILS)
Ulf Leser, Felix Naumann, Barbara Eckmann
Lecture Notes in Computer Science. Springer, 2006
System P: Query Answering in PDMS under Limited Resources
Armin Roth, Felix Naumann, Tobias Hübner, Martin Schweigert
Proceedings of the International Workshop on Information Integration on the Web (IIWeb), 2006
[Paper]
Informationsintegration: Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen
Ulf Leser, Felix Naumann
dpunkt, 2006
[Paper]
Detecting Duplicates in Complex XML Data
Melanie Weis, Felix Naumann
Proceedings of the International Conference on Data Engineering (ICDE), 2006
[Paper]
Information Quality: How Good are Off-the-Shelf DBMS?
Felix Naumann, Mary Roth
Information Quality Management: Theory and Applications. Idea Group Inc., 2006

2005

(Almost) Hands-Off Information Integration for the Life Sciences
Ulf Leser, Felix Naumann
Proceedings of the International Conference on Innovative Database Research (CIDR), 2005
[Paper]
Self-Extending Peer Data Management
Ralf Heese, Sven Herschel, Felix Naumann, Armin Roth
Datenbanksysteme in Business, Technologie und Web (BTW), Karlsruhe, Germany, 2005
[Paper]
Enhancing the Semantics of Links and Paths in Life Science Sources
Stephan Heymann, Felix Naumann, Peter Rieger, Louiqa Raschid
ICDT Workshop on Database Issues in Biological Databases (DBiBD), 2005
[Paper]
Declarative Data Fusion - Syntax, Semantics, and Implementation
Jens Bleiholder, Felix Naumann
Proceedings of the International Conference on Advances in Databases and Information Systems (ADBIS), 2005
[Paper]
Proceedings of the 2005 International Conference on Information Quality (MIT IQ Conference), Sponsored by Lockheed Martin, MIT, Cambridge, MA, USA, November 10-12, 2006

MIT, 2005
Ein Data-Quality-Wettbewerb
Michael Mielke, Heiko Müller, Felix Naumann
Datenbank-Spektrum (2005)
[Paper]
A Data Model and Query Language to Explore Enhanced Links and Paths in Life Science Sources
George A. Mihaila, Felix Naumann, Louiqa Raschid, Maria-Esther Vidal
Proceedings of the International Workshop on the Web & Databases (WebDB), 2005
[Paper]
DogmatiX Tracks down Duplicates in XML
Melanie Weis, Felix Naumann
Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2005
[Paper]
Benefit and Cost of Query Answering in PDMS
Armin Roth, Felix Naumann
Proceedings of the Databases, Information Systems, and Peer-to-Peer Computing Workshop (DBISP2P) Seoul, Korea, 2005
[Paper]
Fuzzy Duplicate Detection on XML Data
Melanie Weis
Proceedings of the VLDB PhD workshop, 2005
[Paper]
Schema Matching using Duplicates
Alexander Bilke, Felix Naumann
Proceedings of the International Conference on Data Engineering (ICDE), 2005
[Paper]
Automatic Data Fusion with HumMer (demo)
Alexander Bilke, Jens Bleiholder, Christoph Böhm, Karsten Draba, Felix Naumann, Melanie Weis
Proceedings of the International Conference on Very Large Data Bases (VLDB), 2005
[Paper]
A Duplicate Detection Benchmark for XML (and Relational) Data
Melanie Weis, Felix Naumann, Franziska Brosy
Proceedings of the SIGMOD International Workshop on Information Quality for Information Systems (IQIS), 2005
[Paper]
Beitragsband zum Studierenden-Programm bei der 11. Fachtagung "Datenbanken für Business, Technologie and Web", GI Fachbereich Datenbanken und Informationssysteme, Karlsruhe
Hagen Höpfner, Gunter Saaske, Felix Naumann, Andreas Heuer
Universität Magdeburg, Fakultät für Informatik, 2005
Clio: A Schema Mapping Tool for Information Integration
Mauricio A. Hernández, Lucian Popa, Howard Ho, Felix Naumann
Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Networks (ISPAN), 2005

2004

Information Quality: How Good Are Off-The-Shelf DBMS?
Felix Naumann, Mary Roth
Proceedings of the International Conference on Information Quality (ICIQ), Cambridge, MA, 2004
[Paper]
Proceedings of the International Workshop on Information Quality in Information Systems (SIGMOD Workshop)
Felix Naumann, Monica Scannapieco
ACM, 2004
Labeling and Enhancing Life Sciences Links
Stephan Heymann, Felix Naumann, Louiqa Raschid, Peter Rieger
Proceedings of the International IEEE Computer Society Computational Systems Bioinformatics Conference (CSB), 2004
[Paper]
Eine Übung zur Vorlesung Informationsintegration
Felix Naumann, Jens Bleiholder, Melanie Weis
Datenbank-Spektrum (2004)
[Paper]
Informationsintegration
Felix Naumann
Öffentliche Vorlesung an der Humboldt-Universität zu Berlin, 2004
Qualitäts- und Semantik-gesteuerte Anfragebearbeitung für Peer-basierte Datenmanagementsysteme (PDMS)
Armin Roth, Felix Naumann
INFORMATIK - Band 1, Beiträge der 34. Jahrestagung der Gesellschaft für Informatik e.V. (GI), Ulm, Germany, 2004
[Paper]
Querying Web-Accessible Life Science Sources: Which paths to choose?
Jens Bleiholder, Felix Naumann, Louiqa Raschid, Maria Esther Vidal
Proceedings of the International Workshop on Information Integration on the Web (IIWeb), 2004
[Paper]
Links and Paths through Life Sciences Data Sources
Zoé Lacroix, Hyma Murthy, Felix Naumann, Louiqa Raschid
Humboldt-Universität zu Berlin, Institut für Informatik, 2004
[Paper]
BioFast: Challenges in Exploring Linked Life Science Sources
Jens Bleiholder, Zoé Lacroix, Hyma Murthy, Felix Naumann, Louiqa Raschid, Maria-Esther Vidal
SIGMOD Record 33:(2), 2004
[Paper]
Links and Paths through Life Sciences Data Sources
Zoé Lacroix, Hyma Murthy, Felix Naumann, Louiqa Raschid
Proceedings of the International WorkshopData Integration in the Life Sciences (DILS), 2004
[Paper]
FUSE BY: Syntax und Semantik zur Informationsfusion in SQL
Jens Bleiholder, Felix Naumann
INFORMATIK, Band 1, Beiträge der 34. Jahrestagung der Gesellschaft für Informatik e.V. (GI), 2004
[Paper]
Detecting Duplicate Objects in XML Documents
Melanie Weis, Felix Naumann
International Workshop on Information Quality in Information Systems (IQIS), 2004
[Paper]
Completeness of integrated information sources
Felix Naumann, Johann Christoph Freytag, Ulf Leser
Information Systems (IS) 29:(7), 2004

2003

Exploring Life Sciences Data Sources
Zoé Lacroix, Felix Naumann, Louiqa Raschid, Maria-Esther Vidal
Proceedings of Workshop on Information Integration on the Web (IIWeb), 2003
[Paper]
Information Quality Assessment and Measurement
Felix Naumann, Cinzia Capiello, Vipul Kashyap, Gunter Saake
Data Quality on the Web, 2003
Super-Fast XML Wrapper Generation in DB2: A Demonstration
Vanja Josifovski, Sabine Massmann, Felix Naumann
Proceedings of the International Conference on Data Engineering (ICDE), 2003
[Paper]
Object Identification Quality
Mattis Neiling, Steffen Jurk, Hans-J. Lenz, Felix Naumann
Proceedings of the International Workshop on Data Quality in Cooperative Information Systsems (DQCIS), 2003
[Paper]
Semantic Overlay Clusters within Super-Peer Networks
Alexander Löser, Felix Naumann, Wolf Siberski, Wolfgang Nejdl, Uwe Thaden
First International Workshop on Databases, Information Systems, and Peer-to-Peer Computing (DBISP2P), 2003
[Paper]
Data Quality in Genome Databases
Heiko Müller, Felix Naumann
Proceedings of the International Conference on Information Quality (ICIQ), 2003
[Paper]
Qualitätsgesteuerte Anfragebearbeitung für Integrierte Informationssysteme
Felix Naumann
it - Information Technology 45:(1), 2003
[Paper]
Completeness of Information Sources
Felix Naumann, Johann-Christoph Freytag, Ulf Leser
Proceedings of the International Workshop on Data Quality in Cooperative Information Systsems (DQCIS), 2003
[Paper]

2002

Schema Management
Periklis Andritsos, Ronald Fagin, Ariel Fuxman, Laura M. Haas, Mauricio A. Hernández, C. T. Howard Ho, Anastasios Kementsietsidis, Renée J. Miller, Felix Naumann, Lucian Popa, Yannis Velegrakis, Charlotte Vilarem, Ling-Ling Yan
Data Engineering Bulletin 25:(3), 2002
[Paper]
Declarative Data Merging with Conflict Resolution
Felix Naumann, Matthias Häussler
Proceedings of the International Conference on Information Quality (ICIQ), 2002
[Paper]
Quality-Driven Query Answering for Integrated Information Systems
Felix Naumann
Lecture Notes in Computer Science. Springer, 2002
Mapping XML and Relational Schemas with Clio (demo)
Mauricio A. Hernández, Lucian Popa, Yannis Velegrakis, Renée J. Miller, Felix Naumann, Ching-Tien Ho
Proceedings of the International Conference on Data Engineering (ICDE), 2002
[Paper]
Schema Mapping and Data Integration with Clio (demo)
Barbara Eckman, Mauricio Hernandez, Howard Ho, Felix Naumann, Lucian Popa
Intelligent Systems for Molecular Biology (ISMB), 2002
[Paper]
Attribute Classification Using Feature Analysis
Felix Naumann, Ching-Tien Ho, Xuqing Tian, Laura M. Haas, Nimrod Megiddo
Proceedings of the International Conference on Data Engineering (ICDE), 2002
[Paper]
Attribute Classification Using Feature Analysis
Felix Naumann, Ching-Tien Ho, Xuqing Tian, Laura Haas, Nimrod Megiddo
IBM Almaden Research Center, 2002
[Paper]

2001

From Databases to Information Systems - Information Quality Makes the Difference
Felix Naumann
Proceedings of the International Conference on Information Quality (ICIQ), 2001
[Paper]

2000

Approximate Tree Embedding for Querying XML Data
Torsten Schlieder, Felix Naumann
Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval, 2000
[Paper]
Assessment Methods for Information Quality Criteria
Felix Naumann, Claudia Rolker
Proceedings of the International Conference on Information Quality (ICIQ), 2000
[Paper]
Completeness of Information Sources
Felix Naumann, Johann-Christoph Freytag
Humboldt-Universität zu Berlin, Institut für Informatik, 2000
[Paper]
Assessment Methods for Information Quality Criteria
Felix Naumann, Claudia Rolker
Humboldt-Universität zu Berlin, Institut für Informatik, 2000
[Paper]
Maximizing Coverage of Mediated Web Queries
Ramana Yerneni, Felix Naumann, Hector Garcia-Molina
Stanford University, CA, 2000
[Paper]
Quality-driven Query Planning
Felix Naumann
Proceedings of the EDBT PhD Workshop, 2000
Cooperative Query Answering with Density Scores
Felix Naumann, Ulf Leser
Proceedings of the International Conference on Management of Data (COMAD), 2000
[Paper]
Query Planning with Information Quality Bounds
Ulf Leser, Felix Naumann
Proceedings of the International Conference on Flexible Query Answering Systems (FQAS), 2000
[Paper]

1999

Quality-driven Integration of Heterogeneous Information Systems
Felix Naumann, Ulf Leser, Johann Christoph Freytag
Proceedings of International Conference on Very Large Data Bases (VLDB), 1999
[Paper]
Quality-driven Integration of Heterogeneous Information Systems
Felix Naumann, Ulf Leser, Johann-Christoph Freytag
Humboldt-Universität zu Berlin, Institut für Informatik, 1999
[Paper]
Density Scores for Cooperative Query Answering
Felix Naumann, Ulf Leser
Workshop on Föderierte Datenbanken (FDBMS), 1999
[Paper]
Do Metadata Models meet IQ Requirements?
Felix Naumann, Claudia Rolker
Proceedings of the International Conference on Information Quality (ICIQ), 1999
[Paper]

1998

Quality Driven Source Selection Using Data Envelopment Analysis
Felix Naumann, Johann Christoph Freytag, Myra Spiliopoulou
Proceedings of the International Conference on Information Quality (ICIQ), 1998
[Paper]
Data Fusion and Data Quality
Felix Naumann
Proceedings of the New Techniques & Technologies for Statistics Seminar (NTTS), 1998
[Paper]

Publications

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Chair

News

06.10.2024 | Paper accepted at EDBT 2025

06.09.2024 | Congratulations Dr. Phillip Wenig

06.09.2024 | Congratulations Dr. Mazhar Hameed!

16.07.2024 | Congratulations Dr. Leon Bornemann-Paulus!

23.05.2024 | Paper accepted at NLDB 2024

Project highlights

People and open positions