International Workshop on Quality in Databases (QDB) 2024
13th International Workshop on Quality in Databases at the 50th VLDB conference
August 26, 2024, Guangzhou, China
Welcome
News
- The workshop program is online (last update 20.8.2024).
- Paper notifications are out.
- Submission deadline was extended upon request to June 14. Submit your paper via CMT here.
- Quanqing Xu (Senior Researcher at Oceanbase) will share his experience on data quality in an industry talk at QDB'24.
- QDB'24 will feature an invited keynote by Sebastian Schelter (TU Berlin) to talk about his latest research on data quality.
- QDB’24 workshop proposal is accepted as VLDB workshop.
Quality in Databases
Data quality has been a major concern of organizations for decades. The recent advances in artificial intelligence (AI) have brought data quality (DQ) back into the spotlight: while many recent data quality and cleaning solutions are powered by ML, DQ is a core requirement to ensure reliable AI-based systems. DQ is tackled from different perspectives by different research communities, including database, machine learning (ML), and information systems. We believe it is important to bring together these communities to foster a vital discussion about the future of DQ assessment and improvement.
Considering the large number of participants (>50) at QDB’23, QDB'24 aims to (1) continue to host the vital discussions about data quality, and (2) exchange best practices and novel methods for (semi-)automated (ML-based) data quality assessment and improvement in the context of AI-based systems.
Program
| 09:00-09:15 | Opening | Lisa & Hazar |
| 09:15-10:30 | Research Session 1 | Chair: Lisa |
| Accelerating the Data Cleaning Systems Raha and Baran through Task and Data Parallelism Fatemeh Ahmadi, Yusuf Mandirali, Ziawasch Abedjan | ||
| Valuation-based Data Acquisition for Machine Learning Fairness Ekta and Romila Pradhan | ||
| AutoFAIR : Automatic Data FAIRification via Machine Reading Tingyan Ma, Wei Liu, Bin Lu, Xiaoying Gan, Yunqiang Zhu, Luoyi Fu, Chenghu Zhou | ||
| 10:30 - 11:00 | Coffee break | |
| 11:00 - 12:30 | Keynote + 1 Research Paper | Chair: Hazar |
| Invited talk: Sebastian Schelter (bio) How Data Management Research Helps to Improve Real World ML Applications (abstract) | ||
Compute Engine Testing with Privacy-Compliant Production-Like Synthetic Data | ||
| 12:30 - 14:00 | Lunch | |
| 14:00 - 15:30 | Industry Session | Chair: Sourav |
| Industry talk by Quanqing Xu (Oceanbase) Industry talk by Divesh Srivastava (AT&T Labs) | ||
| Panel discussion with Quangqing Xu, Divesh Srivastava, and Fatma Ozcan | ||
| 15:30 - 16:00 | Coffee break | |
| 16:00 - 18:15 | Research Session 2 | Chair: Hazar |
| Process Model-based Access Control Policies for Cross-Organizational Data Sharing Liam Tirpitz, Leon Gentges | ||
| Tracking Consistency over Data Streams with InkStream [Demo] Samuele Langhi. Angela Bonifati. Riccardo Tommasini | ||
| A Data Generator to Explore the Interactions Between Concept Drifts and Anomalies [Demo] Jongjun Park, Akanksha Nehete, Tammy Zeng, Fei Chiang | ||
| Towards Semi-Supervised Data Quality Detection In Graphs Rubab Zahra Sarfraz | ||
| 18:15-18:30 | Closing | Chairs |
Program Committee
Program Chairs
Sourav S Bhowmick (Nanyang Technological University, Singapore)
Lisa Ehrlinger (Hasso Plattner Institute, University of Potsdam, Germany)
Hazar Harmouch (University of Amsterdam, Netherlands)
Steering Committee
Ihab Ilyas (Apple, University of Waterloo, USA)
Felix Naumann (Hasso Plattner Institute, University of Potsdam, Germany)
Program Committee
Ziawasch Abedjan (TU Berlin, Germany)
Antoon Bronselaer (Ghent University, Belgium)
Felix Biessmann (Einstein Center Digital Future, Germany)
Ismael Caballero (University of Castilla La Mancha, Spain)
Cinzia Capiello (Politecnico di Milano, Italy)
Chang Ge (University of Minnesota, USA)
Christine Legner (University of Lausanne, Switzerland)
Sebastian Link (University of Auckland, New Zealand)
Elizabeth Pierce (University of Little Rock at Arkansas, USA)
Kai-Uwe Sattler (TU Ilmenau, Germany)
Sebastian Schelter (University of Amsterdam, Netherlands)
John Talburt (University of Little Rock at Arkansas, USA)
Panos Vassiliadis (University of Ioannina, Greece)
Wolfram Wöß (Johannes Kepler University Linz, Austria)
Keynote
Title: How Data Management Research Helps to Improve Real World ML Applications
Abstract: The talk will given an overview of our past and recent research to improve data quality in ML applications, based on proven principles and techniques from data management. In particular, we will cover work on declarative data unit tests tailored for large-scale data lakes, on reasoning about the datasets for ML applications by treating ML pipelines as algebraic queries, and on leveraging fine-grained data provenance as a foundation for data debugging systems.
Sebastian Schelter is a Full Professor at the Berlin Institute on the Foundations of Learning and Data (BIFOLD) and Technische Universität Berlin. His research is focused on the intersection of data management and machine learning with the goal to foster the responsible management of data and to democratise data science technologies. The research of his group is accompanied by efficient and scalable open source implementations, many of which are applied in real world use cases, for example in the Amazon Web Services cloud and in large European e-commerce platforms. In the past, he has been an assistant professor at the University of Amsterdam, a faculty fellow at New York University, a senior applied scientist at Amazon Research and a research intern at Twitter and IBM Almaden in California. His research contributions have been recognized with an ACM SIGMOD Systems Award, an ACM SIGMOD Best Demo Runner Up Award, and a Best Paper Runner Up Award from the Table Representation Learning workshop at NeurIPS.
Call for Papers
Topics of Interest
The focus is on new and practical methods for (semi-)automated (ML-based) data quality assessment and improvement. The topics of interest include, but are not limited to:
- Data preprocessing
- Data profiling for data quality measurement
- Explainable data cleaning
- DQ requirements for generative AI systems
- DQ using generative AI
- Data quality assessment for AI-based systems
- Data quality improvement / data cleaning for AI-based systems
- Benchmark data sets to evaluate DQ assurance methods
- Automation of DQ assessment and improvement methods
- Methods to scale data quality assessment and cleansing
- ML-powered methods for improving data quality
- Data quality in graph-structured or time-series data
- Metadata management to improve data quality
- Data quality in different data science domains
- Human-in-the-loop approaches for DQ
- Post-training quality / fact checking
- FAIRness in data quality
Important Dates
Submission deadline
(May 31, 2024, 9pm PST)
Extension to June 14, 2024, 9pm PST
Notification
July 22, 2024
Final version
August 5, 2024
Workshop
August 26, 2024
Manuscript Preparation
Submission
Authors are invited to submit original, unpublished full research papers and demo descriptions that are not being considered for publication in any other forum.
Please submit your paper as a PDF using Microsoft's QDB CMT site. You need to append the category tag as a suffix to the title of the paper such as “Data Management in the Year 3000 [Regular]”; “Spatial Database System [Demo]”. This must be done both in the paper file and in the CMT submission title. The suffix will not be part of the camera-ready copy if the paper is accepted.
Format
It is the authors' responsibility to ensure that their submissions adhere to the VLDB format detailed here. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review. Note that the limit of up to 6 pages (including all figures, tables, and references) must be followed for both full papers and demos.
Publication
Accepted papers will be distributed via the CEUR workshop proceedings.
Past Events
We are building on an established tradition of eleven previous international VLDB workshops concerning data and information quality.
- QDB 2023, Vancouver, Canada, co-located with VDLB 2023. Website: https://hpi.de/naumann/projects/conferences-and-workshops-hosted/qdb-2023.html
- QDB 2016, Delhi, India, co-located with VLDB 2016. Report: https://publications.rwth-aachen.de/record/680764
- QDB 2012, Istanbul, Turkey, co-located with VLDB 2012. Report: https://sigmodrecord.org/publications/sigmodRecord/1212/pdfs/11.report.dong.pdf
- QDB 2011, Seattle, US, co-located with VDLB 2011. Website: http://qdb2011.dia.uniroma3.it/index.html
- QDB 2010, Singapore, co-located with the VLDB 2010. Report: http://sigmod.org/publications/sigmodRecord/1112/pdfs/09.report.maurino.pdf
- Earlier version of the QDB workshop were co-located with VLDB from 2007-2009.
- CleanDB 2006, Seoul, Korea co-located with VLDB 2006. Website: https://pike.psu.edu/cleandb06/
- From 2004-2006, the predecessor workshop was termed IQIS and co-located with SIGMOD.