Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

International Workshop on Quality in Databases (QDB) 2007

5th International Workshop on Quality in Databases at VLDB

September 23, 2007, Vienna, Austria

Welcome

News

  • QDB 2007 is now on DBLP.
  • We are happy to announce invited talks by Renée Miller (University of Toronto) and AnHai Doan (University of Wisconsin)!
  • Workshop registration is now open through the VLDB registration site. Due to generous sponsorship by Microsoft research, the workshop fee is only 40 Euro!

Quality in Databases

Data and information quality has become an increasingly important and interesting topic for the database community. Solutions to measure and improve the quality of data stored in databases are relevant for many areas, including data warehouses, scientific databases, and customer relationship management. The QDB 2007 work- shop focuses on practical methods for data quality assessment and data quality improvement. QDB'07 continues and combines the successful three IQIS workshops held at SIGMOD 2004-2006 and the CleanDB workshop held at VLDB 2006.

Call for Papers

Topics of Interest

include but are not limited to

  • Duplicate detection / entity resoluti
  • Data scrubbing / data standardization
  • Data quality benchmarks
  • Data quality assessment / measures
  • Data quality models and algebra
  • Quality-aware query languages
  • Quality-aware query processing techniques
  • Quality of information on the Web
  • Quality of scientific, geographical, and multimedia data
  • Data quality in multi-database settings

Important Dates

Submission deadline (passed)
June 29, 2007, 9pm PST

Notification (passed)
August 10, 2007

Final version
August 31, 2007

Workshop
September 23, 2007

Manuscript Preparation

Format
It is the authors' responsibility to ensure that their submissions adhere to the formatting detailed here. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review.

Please note that papers must be submitted in the VLDB format, which is described in detail here. Note that the limit of 12 pages must be followed.

Publication
Accepted papers will be distributed in informal proceedings at the workshop.

Program

The progam will feature invited talks by Renée Miller (University of Toronto) and AnHai Doan (University of Wisconsin). Stay tuned for a detailed program. Please register at the VLDB registration site. Due to generous sponsorship by Microsoft research, the workshop fee is only 40 Euro!

Introduction
9:15 - 9:30 Felix Naumann and Venky Ganti (pdf)

Invited talk: Renée Miller
9:30 - 10:30
Management of Inconsistent and Uncertain Data (abstract below)
by Renée J. Miller (University of Toronto)

Coffee break
10:30 - 11:00

Research Session 1
11:00 - 13:00
Accuracy of Approximate String Joins Using Grams (pdf,slides)
Oktie Hassanzadeh, Mohammad Sadoghi, and Renee J. Miller

QoS: Quality Driven Data Abstraction Generation For Large Databases
Charudatta V. Wad, Elke A. Rundensteiner and Matthew O. Ward

Quality-Driven Mediation for Geographic Data
Yassine Lassoued, Mehdi Essid, Omar Boucelma, and Mohamed Quafafou

On the performance of one-to-many data transformations
Paulo Carreira, Helena Galhardas, Joao Pereira, Fernando Martins, and Mario J. Silva

Lunch
13:00 - 14:00

Invited talk: AnHai Doan
14:00 - 15:00
Data Quality Challenges in Community Systems (abstract below)
by AnHai Doan (University of Wisconsin)

Research Session 2
15:00 - 16:00
Towards a Benchmark for ETL Workflows (pdf)
Panos Vassiliadis, Anastasios Karagiannis, Vasiliki Tziovara, and Alkis Simitsis

Information Quality Measurement in Data Integration Schemas (pdf, slides)
Maria da Conceição Moraes Batista, Ana Carolina Salgado

Coffee break
16:00 - 16:30

Program Committee

Program Chairs
Venky Ganti, Microsoft Research, Redmond
Felix Naumann, Hasso-Plattner-Institut, Potsdam

Program Committee
Laure Berti, IRISA, France
Tiziana Catarci, Universita di Roma, La Sapienza
Ariel Fuxman, University of Toronto
Helena Galhardas, Technical University of Lisbon, Portugal
Michael Gertz, UC Davis
Mauricio Hernandez, IBM Almaden
Vipul Kashyap, Partners HealthCare System
Raghav Kaushik, Microsoft Research
Nick Koudas, University of Toronto
Chen Li, UC Irvine
Michael Mielke, Deutsche Bahn AG

Past Events

IQIS 2004
International Workshop on Information Quality in Information Systems
Co-located with SIGMOD 2004 in Paris.
http://www.hiqiq.de/iqis/

IQIS 2005
International Workshop on Information Quality in Information Systems
Co-located with SIGMOD 2005 in Baltimore.
http://iqis.irisa.fr/

IQIS 2006
International Workshop on Information Quality in Information Systems
Co-located with SIGMOD 2006 in Chicago.
http://queens.db.toronto.edu/iqis2006/

CleanDB 2006
International Workshop on Information Quality in Information Systems
Co-located with VLDB 2006 in Seoul.
http://pike.psu.edu/cleandb06/

Abstracts

Management of Inconsistent and Uncertain Data
Renée Miller (University of Toronto)

Although integrity constraints have long been used to maintain data consistency, there are situations in which they may not be enforced or satisfied. In this talk, I will describe ConQuer, a system for efficient and scalable answering of SQL queries on databases containing inconsistent or uncertain data. ConQuer permits users to postulate a set of constraints together with their queries. The system rewrites the queries to retrieve data that are consistent with respect to the constraints. When data is uncertain, ConQuer returns each query answer with a likelihood that the answer is consistent. Hence, ConQuer allows a user to understand what query answers are known to be true, even when a database contains uncertainty. Our rewriting is into SQL, and I will show that the rewritten queries can be efficiently optimized and executed by a commercial database system. I will conclude with some open problems.

Bio
Renée J. Miller is a professor of computer science and the Bell University Lab Chair of Information Systems at the University of Toronto. She received the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received an NSF CAREER Award, the Premier's Research Excellence Award, and an IBM Faculty Award. Her research interests are in the efficient, effective use of large volumes of complex, heterogeneous data. This interest spans data integration and exchange, inconsistent and uncertain data management, and knowledge curation. She serves on the Board of Trustees of the VLDB Endowment, was a member of and chaired the ACM Kanellakis Awards committee, and served as PC co-chair of VLDB in 2004. She received her PhD in Computer Science from the University of Wisconsin, Madison and bachelor's degrees in Mathematics and Cognitive Science from MIT.

Data Quality Challenges in Community Systems
AnHai Doan (University of Wisconsin)

Over the past three years, in Cimple, a joint effort between Wisconsin and Yahoo! Research, we have been trying to build community systems. Such systems employ automatic data management techniques, such as information extraction and integration, as well as user-centric Web 2.0-style technologies, to build structured data portals for online communities. As the work progresses, we have encountered a broad range of fascinating data cleaning challenges. Some of these (e.g., data quality evaluation, record reconciliation) also arise in traditional ETL processes. But here they become exacerbated, take on new nuances, or are amenable to novel solutions that exploit community characteristics. Many other challenges however are new, and arise due to the fact that community systems engage a multitude of users of varying skills and knowledge. Examples include how to entice users to collaboratively clean data, how to handle "noisy" users, and how to make certain cleaning tasks easy for "the masses". We describe the challenges and our initial solutions. We also describe the infrastructure support (code, data, etc.) that we can provide, in the hope that other researchers will join and help us address these problems.

Bio
AnHai Doan works in the database group at the University of Wisconsin-Madison. His interests cover databases, AI, and Web. His current research focuses on Web community management, data integration, mass collaboration, text management, information extraction, and schema matching. Selected recent honors include the ACM Doctoral Dissertation Award (2003), CAREER Award (2004), Alfred P. Sloan Research Fellowship (2007), and IBM Faculty Award (2007). Selected recent professional activities include co-chairing WebDB at SIGMOD-05 and the AI Nectar track at AAAI-06.