Prof. Dr. Felix Naumann

For current projects in the area of data quality and data cleansing, please see our work on data preparation and data profiling.

Completed projects

In the past, we have built various large and small data integration systems. They are no longer maintained and many if not most of them are not longer actively running. Please contact Felix Naumann to learn more.

  • CurEx: A system for extracting, curating, and exploring domain-specific knowledge graphs
  • DuDe: A duplicate detection framework and suite of algorithms and datasets
  • Annealing Standard: A system too gradually build a gold standard for classification problems
  • Aladin: A system to perform almost automatic integration of datasets.
  • BibTex Deduplication: An online service to deduplicate bibliography files.
  • DAQS: Data Quality as a Service
  • Data Fusion: Technolgies for combining duplicates into single consistent records
  • Dirty XML Generator: Create XML data with duplicates for evaluation purposes
  • GovWILD: An integrated set of government data to explore nepotism in politics and economy.
  • HiQIQ: High quality information querying
  • MAC / Hummer: A system to integrate heterogeneous datasets, including schema matching, deduplication and data fusion steps.
  • METL: Systematic management of sets of complex ETL processes
  • MyDBLP: Systematically annotate bibliographic data
  • Service Integration with Posr/Depot/Faster: Search, maintain and compose data services
  • Similarity Search: Methods to efficiently find similar records in large databases
  • System P: A peer data management system (PDMS) for data integration
  • Viqtor: Bulk data quality annotations
  • XClean: Cleaning and deduplicating XML data
  • XQueryGen: An interactive tool to create complex XQueries