Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Current projects

  • CurEx: A system for extracting, curating, and exploring domain-specific knowledge graphs
  • DuDe: A duplicate detection framework and suite of algorithms and datasets
  • Annealing Standard: A system too gradually build a gold standard for classification problems

Completed projects

In the past, we have built various large and small data integration systems. They are no longer maintained and many if not most of them are not longer actively running. Please contact Felix Naumann to learn more.

  • Aladin: A system to perform almost automatic integration of datasets.
  • BibTex Deduplication: An online service to deduplicate bibliography files.
  • DAQS: Data Quality as a Service
  • Data Fusion: Technolgies for combining duplicates into single consistent records
  • Dirty XML Generator: Create XML data with duplicates for evaluation purposes
  • GovWILD: An integrated set of government data to explore nepotism in politics and economy.
  • HiQIQ: High quality information querying
  • MAC / Hummer: A system to integrate heterogeneous datasets, including schema matching, deduplication and data fusion steps.
  • METL: Systematic management of sets of complex ETL processes
  • MyDBLP: Systematically annotate bibliographic data
  • Service Integration with Posr/Depot/Faster: Search, maintain and compose data services
  • Similarity Search: Methods to efficiently find similar records in large databases
  • System P: A peer data management system (PDMS) for data integration
  • Viqtor: Bulk data quality annotations
  • XClean: Cleaning and deduplicating XML data
  • XQueryGen: An interactive tool to create complex XQueries