Prof. Dr. Felix Naumann

Dr. Melanie Herschel

Former member


Research Areas

  • Duplicate Detection
  • Data Cleaning


Ongoing Projects

  • Duplicate detection in cooperation with Schufa AG.
  • HumMer: Der Humboldt Merger. Tool for ad-hoc data integration in relational data.
  • XClean: Declarative XML Data Cleaning. In cooperation with Ioana Manolescu (INRIA Futurs, France) and Helena Galhardas (IST Tagus Park, Portugal)

Completed Projects

  • Dirty XML Generator. Tool to generate inexact duplicates in XML data.
  • XQuery Generator. Tool to graphically support XQuery generation.



  • Winter 07/08: "Schema Matching" seminar
  • Summer 07: "Data Cleaning" seminar
  • Winter 06/07:"Data fusion in three steps" seminar
  • Winter 04/05: Practical course "Information Integration" at Humboldt University Berlin
  • Sommer 04: Practical course "Information Integration II" at Humboldt University Berlin

Professional Activities

  • Program committee member of the DataX 2008 workshop
  • Reviewer for IS, JDIQ, TOIT



  • Structure-Based Inference of XML Similarity for Fuzzy Duplicate Detection<br>Luis Leitao, Pavel Calado, and Melanie Weis. <i>CIKM 2007</i>, Lisboa, Portugal.
  • Declarative XML Data Cleaning with XClean<br>Melanie Weis and Ioana Manolescu. <i>CAISE 2007</i>, Trondheim, Norway.
  • XML Duplicate Detection Using Sorted Neighborhoods <br> Sven Puhlmann, Melanie Weis and Felix Naumann. <i>EDBT 2006</i>, Munich, Germany.
  • <a href="fileadmin/user_upload/fachgebiete/naumann/publications/SIGMOD05.pdf">DogmatiX Tracks down Duplicates in XML </a><br>Melanie Weis and Felix Naumann. <i>SIGMOD 2005</i>, Baltimore, MD.


  • <a href="fileadmin/user_upload/fachgebiete/naumann/publications/benchmark_iqis06.pdf">A Duplicate Detection Benchmark for XML (and Relational) Data</a><br> Melanie Weis, Felix Naumann and Franziska Brosy. <i>SIGMOD 2006 Workshop on Information Quality for Information Systems (IQIS)</i>, Chicago, IL
  • <a href="fileadmin/user_upload/fachgebiete/naumann/publications/XSDM06.pdf">XStruct: Efficient Schema Extraction from Multiple and Large XML Documents</a><br>Jan Hegewald, Felix Naumann and Melanie Weis. <i>ICDE 2006 Workshop on XML Schema and Data Management (XSDM)</i>, Atlanta, Georgia.
  • <a href="fileadmin/user_upload/fachgebiete/naumann/publications/VLDB05Phd_xmloid.pdf">Fuzzy Duplicate Detection on XML Data</a><br> Melanie Weis. <i>VLDB 2005 PhD Workshop</i>, Trondheim, Norway.
  • <a href="fileadmin/user_upload/fachgebiete/naumann/publications/IQIS04.pdf">Detecting Duplicate Objects in XML Documents</a><br>Melanie Weis and Felix Naumann. <i>SIGMOD 2004 Workshop on Information Quality for Information Systems (IQIS)</i> , Paris, France.

Posters & Demos

  • <a href="fileadmin/user_upload/fachgebiete/naumann/publications/xclean_crv.pdf">XClean in Action (demo)</a><br>Melanie Weis and Ioana Manolescu.<i>CIDR 2007 </i>, Asilomar, California. To Appear.
  • <a href="fileadmin/user_upload/fachgebiete/naumann/publications/ICDE06.pdf">Detecting Duplicates in Complex XML Data (poster)</a><br>Melanie Weis and Felix Naumann. <i>ICDE 2006</i>, Atlanta, Georgia.
  • <a href="fileadmin/user_upload/fachgebiete/naumann/publications/VLDB2005.pdf">Automatic Data Fusion with HumMer (demo)</a><br>Alexander Bilke, Jens Bleiholder, Christoph Böhm, Karsten Draba, Felix Naumann, Melanie Weis. <i>VLDB 2005</i>, Troindheim, Norway.


  • <a href="http://www.hpi.uni-potsdam.de/fileadmin/user_upload/fachgebiete/naumann/publications/DEBull06.pdf">Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies</a><br>Felix Naumann, Alexander Bilke, Jens Bleiholder, Melanie Weis. <i>Bulletin of the Technical Committee on Data Engineering, Vol. 29 No. 2, June 2006</i>, 21-31.
  • Erkennen und Bereinigen von Datenfehlern in naturwissenschaftlichen Daten (german)<br> Heiko Müller, Melanie Weis, Jens Bleiholder and Ulf Leser. <i>Datenbank-Spektrum, Heft 15, November 2005</i>, 36-43.
  • Eine Übung zur Vorlesung Informationsintegration (german)<br>Felix Naumann, Jens Bleiholder, Melanie Weis. <i>Datenbank-Spektrum Heft 11, November 2004</i>, 50-52.

Technical Reports