International Workshop on Quality in Databases (QDB) 2024

13th International Workshop on Quality in Databases at the 50th VLDB conference

August 26, 2024, Guangzhou, China

Welcome

News

The workshop program is online (last update 20.8.2024).
Paper notifications are out.
Submission deadline was extended upon request to June 14. Submit your paper via CMT here.
Quanqing Xu (Senior Researcher at Oceanbase) will share his experience on data quality in an industry talk at QDB'24.
QDB'24 will feature an invited keynote by Sebastian Schelter (TU Berlin) to talk about his latest research on data quality.
QDB’24 workshop proposal is accepted as VLDB workshop.

Quality in Databases

Data quality has been a major concern of organizations for decades. The recent advances in artificial intelligence (AI) have brought data quality (DQ) back into the spotlight: while many recent data quality and cleaning solutions are powered by ML, DQ is a core requirement to ensure reliable AI-based systems. DQ is tackled from different perspectives by different research communities, including database, machine learning (ML), and information systems. We believe it is important to bring together these communities to foster a vital discussion about the future of DQ assessment and improvement.

Considering the large number of participants (>50) at QDB’23, QDB'24 aims to (1) continue to host the vital discussions about data quality, and (2) exchange best practices and novel methods for (semi-)automated (ML-based) data quality assessment and improvement in the context of AI-based systems.

Program

09:00-09:15	Opening	Lisa & Hazar
09:15-10:30	Research Session 1	Chair: Lisa
	Accelerating the Data Cleaning Systems Raha and Baran through Task and Data Parallelism Fatemeh Ahmadi, Yusuf Mandirali, Ziawasch Abedjan
	Valuation-based Data Acquisition for Machine Learning Fairness Ekta and Romila Pradhan
	AutoFAIR : Automatic Data FAIRification via Machine Reading Tingyan Ma, Wei Liu, Bin Lu, Xiaoying Gan, Yunqiang Zhu, Luoyi Fu, Chenghu Zhou
10:30 - 11:00	Coffee break
11:00 - 12:30	Keynote + 1 Research Paper	Chair: Hazar
	Invited talk: Sebastian Schelter (bio) How Data Management Research Helps to Improve Real World ML Applications (abstract)
	Compute Engine Testing with Privacy-Compliant Production-Like Synthetic Data Yu Liu, Jiangnan Cheng, Steve Chuck, Lyublena Antova, Yurgis Baykshtis, Matt David, Ge Gao, Mehrdad Honarkhah, Kuan-Sung Huang, Chen-Kuei Lee, Usman Muhammad, Shihao Peng, Andrii Rosa, Rebecca Schlussel, Michael Shang, Kelvin Silva, Brandon Vo, Zac Wen, Yihao Zhou
12:30 - 14:00	Lunch
14:00 - 15:30	Industry Session	Chair: Sourav
	Industry talk by Quanqing Xu (Oceanbase) Industry talk by Divesh Srivastava (AT&T Labs)
	Panel discussion with Quangqing Xu, Divesh Srivastava, and Fatma Ozcan
15:30 - 16:00	Coffee break
16:00 - 18:15	Research Session 2	Chair: Hazar
	Process Model-based Access Control Policies for Cross-Organizational Data Sharing Liam Tirpitz, Leon Gentges
	Tracking Consistency over Data Streams with InkStream [Demo] Samuele Langhi. Angela Bonifati. Riccardo Tommasini
	A Data Generator to Explore the Interactions Between Concept Drifts and Anomalies [Demo] Jongjun Park, Akanksha Nehete, Tammy Zeng, Fei Chiang
	Towards Semi-Supervised Data Quality Detection In Graphs Rubab Zahra Sarfraz
18:15-18:30	Closing	Chairs

Program Committee

Program Chairs

Sourav S Bhowmick (Nanyang Technological University, Singapore)
Lisa Ehrlinger (Hasso Plattner Institute, University of Potsdam, Germany)
Hazar Harmouch (University of Amsterdam, Netherlands)

Steering Committee

Ihab Ilyas (Apple, University of Waterloo, USA)
Felix Naumann (Hasso Plattner Institute, University of Potsdam, Germany)

Program Committee

Ziawasch Abedjan (TU Berlin, Germany)
Antoon Bronselaer (Ghent University, Belgium)
Felix Biessmann (Einstein Center Digital Future, Germany)
Ismael Caballero (University of Castilla La Mancha, Spain)
Cinzia Capiello (Politecnico di Milano, Italy)
Chang Ge (University of Minnesota, USA)
Christine Legner (University of Lausanne, Switzerland)
Sebastian Link (University of Auckland, New Zealand)
Elizabeth Pierce (University of Little Rock at Arkansas, USA)
Kai-Uwe Sattler (TU Ilmenau, Germany)
Sebastian Schelter (University of Amsterdam, Netherlands)
John Talburt (University of Little Rock at Arkansas, USA)
Panos Vassiliadis (University of Ioannina, Greece)
Wolfram Wöß (Johannes Kepler University Linz, Austria)

Keynote

Title: How Data Management Research Helps to Improve Real World ML Applications

Abstract: The talk will given an overview of our past and recent research to improve data quality in ML applications, based on proven principles and techniques from data management. In particular, we will cover work on declarative data unit tests tailored for large-scale data lakes, on reasoning about the datasets for ML applications by treating ML pipelines as algebraic queries, and on leveraging fine-grained data provenance as a foundation for data debugging systems.

Sebastian Schelter is a Full Professor at the Berlin Institute on the Foundations of Learning and Data (BIFOLD) and Technische Universität Berlin. His research is focused on the intersection of data management and machine learning with the goal to foster the responsible management of data and to democratise data science technologies. The research of his group is accompanied by efficient and scalable open source implementations, many of which are applied in real world use cases, for example in the Amazon Web Services cloud and in large European e-commerce platforms. In the past, he has been an assistant professor at the University of Amsterdam, a faculty fellow at New York University, a senior applied scientist at Amazon Research and a research intern at Twitter and IBM Almaden in California. His research contributions have been recognized with an ACM SIGMOD Systems Award, an ACM SIGMOD Best Demo Runner Up Award, and a Best Paper Runner Up Award from the Table Representation Learning workshop at NeurIPS.

Call for Papers

Topics of Interest

The focus is on new and practical methods for (semi-)automated (ML-based) data quality assessment and improvement. The topics of interest include, but are not limited to:

Data preprocessing
Data profiling for data quality measurement
Explainable data cleaning
DQ requirements for generative AI systems
DQ using generative AI
Data quality assessment for AI-based systems
Data quality improvement / data cleaning for AI-based systems
Benchmark data sets to evaluate DQ assurance methods
Automation of DQ assessment and improvement methods
Methods to scale data quality assessment and cleansing
ML-powered methods for improving data quality
Data quality in graph-structured or time-series data
Metadata management to improve data quality
Data quality in different data science domains
Human-in-the-loop approaches for DQ
Post-training quality / fact checking
FAIRness in data quality

Important Dates

Submission deadline
(May 31, 2024, 9pm PST)
Extension to June 14, 2024, 9pm PST

Notification
July 22, 2024

Final version
August 5, 2024

Workshop
August 26, 2024

Manuscript Preparation

Submission
Authors are invited to submit original, unpublished full research papers and demo descriptions that are not being considered for publication in any other forum.
Please submit your paper as a PDF using Microsoft's QDB CMT site. You need to append the category tag as a suffix to the title of the paper such as “Data Management in the Year 3000 [Regular]”; “Spatial Database System [Demo]”. This must be done both in the paper file and in the CMT submission title. The suffix will not be part of the camera-ready copy if the paper is accepted.

Format
It is the authors' responsibility to ensure that their submissions adhere to the VLDB format detailed here. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review. Note that the limit of up to 6 pages (including all figures, tables, and references) must be followed for both full papers and demos.

Publication
Accepted papers will be distributed via the CEUR workshop proceedings.

Past Events

We are building on an established tradition of eleven previous international VLDB workshops concerning data and information quality.

QDB 2023, Vancouver, Canada, co-located with VDLB 2023. Website: https://hpi.de/naumann/projects/conferences-and-workshops-hosted/qdb-2023.html
QDB 2016, Delhi, India, co-located with VLDB 2016. Report: https://publications.rwth-aachen.de/record/680764
QDB 2012, Istanbul, Turkey, co-located with VLDB 2012. Report: https://sigmodrecord.org/publications/sigmodRecord/1212/pdfs/11.report.dong.pdf
QDB 2011, Seattle, US, co-located with VDLB 2011. Website: http://qdb2011.dia.uniroma3.it/index.html
QDB 2010, Singapore, co-located with the VLDB 2010. Report: http://sigmod.org/publications/sigmodRecord/1112/pdfs/09.report.maurino.pdf
Earlier version of the QDB workshop were co-located with VLDB from 2007-2009.
CleanDB 2006, Seoul, Korea co-located with VLDB 2006. Website: https://pike.psu.edu/cleandb06/
From 2004-2006, the predecessor workshop was termed IQIS and co-located with SIGMOD.