We are excited to announce that our paper, “TPCx-AI under the Microscope: A Benchmarking Debt Analysis,” was accepted at VLDB 2026.
Title
TPCx-AI under the Microscope: A Benchmarking Debt Analysis
Authors
Ilin Tolovski (HPI), Philipp Hildebrandt (HPI), Khuzaima Daudjee (University of Waterloo), Tilmann Rabl (HPI)
Abstract
TPCx-AI is an industry standard benchmark for evaluating the end-to-end performance of machine learning systems and the underlying hardware configurations. In the database community, individual parts of the dataset and the workloads are used to evaluate preprocessing methods and systems for fast inference. In both of these cases, the datasets and workloads are used based on the characteristics defined in the specification. Upon analysis of TPCx-AI’s dataset and use cases, we observe that the official implementation of TPCx-AI’s kit diverges from the specification, does not evaluate the capabilities of the system under test, and impacts the overall performance in a benchmark run. In this paper, we investigate the benchmarking debt accumulated in the TPCx-AI dataset and the workloads. We identify properties that impact the benchmark’s performance, including runtime and quality of use cases, the defined metrics and their thresholds, workload discrepancies, and data errors. Our analysis shows that all use cases and datasets contain benchmarking debts, impacting the training and serving runtimes by up to 350× and 800×, respectively.
By addressing these debts, we observe an end-to-end throughput increase of up to 3.8× over the default TPCx-AI implementation.