Management of Inconsistent and Uncertain Data
Renée Miller (University of Toronto)
Although integrity constraints have long been used to maintain data consistency, there are situations in which they may not be enforced or satisfied. In this talk, I will describe ConQuer, a system for efficient and scalable answering of SQL queries on databases containing inconsistent or uncertain data. ConQuer permits users to postulate a set of constraints together with their queries. The system rewrites the queries to retrieve data that are consistent with respect to the constraints. When data is uncertain, ConQuer returns each query answer with a likelihood that the answer is consistent. Hence, ConQuer allows a user to understand what query answers are known to be true, even when a database contains uncertainty. Our rewriting is into SQL, and I will show that the rewritten queries can be efficiently optimized and executed by a commercial database system. I will conclude with some open problems.
Bio
Renée J. Miller is a professor of computer science and the Bell University Lab Chair of Information Systems at the University of Toronto. She received the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received an NSF CAREER Award, the Premier's Research Excellence Award, and an IBM Faculty Award. Her research interests are in the efficient, effective use of large volumes of complex, heterogeneous data. This interest spans data integration and exchange, inconsistent and uncertain data management, and knowledge curation. She serves on the Board of Trustees of the VLDB Endowment, was a member of and chaired the ACM Kanellakis Awards committee, and served as PC co-chair of VLDB in 2004. She received her PhD in Computer Science from the University of Wisconsin, Madison and bachelor's degrees in Mathematics and Cognitive Science from MIT.
Data Quality Challenges in Community Systems
AnHai Doan (University of Wisconsin)
Over the past three years, in Cimple, a joint effort between Wisconsin and Yahoo! Research, we have been trying to build community systems. Such systems employ automatic data management techniques, such as information extraction and integration, as well as user-centric Web 2.0-style technologies, to build structured data portals for online communities. As the work progresses, we have encountered a broad range of fascinating data cleaning challenges. Some of these (e.g., data quality evaluation, record reconciliation) also arise in traditional ETL processes. But here they become exacerbated, take on new nuances, or are amenable to novel solutions that exploit community characteristics. Many other challenges however are new, and arise due to the fact that community systems engage a multitude of users of varying skills and knowledge. Examples include how to entice users to collaboratively clean data, how to handle "noisy" users, and how to make certain cleaning tasks easy for "the masses". We describe the challenges and our initial solutions. We also describe the infrastructure support (code, data, etc.) that we can provide, in the hope that other researchers will join and help us address these problems.
Bio
AnHai Doan works in the database group at the University of Wisconsin-Madison. His interests cover databases, AI, and Web. His current research focuses on Web community management, data integration, mass collaboration, text management, information extraction, and schema matching. Selected recent honors include the ACM Doctoral Dissertation Award (2003), CAREER Award (2004), Alfred P. Sloan Research Fellowship (2007), and IBM Faculty Award (2007). Selected recent professional activities include co-chairing WebDB at SIGMOD-05 and the AI Nectar track at AAAI-06.