Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
 

About the Talk

Deploying a machine learning pipeline is a resource-demanding task that requires a combination of data and software engineering expertise. However, even with meticulous testing, the risk of encountering run-time errors during pipeline operation remains a significant concern. In this presentation, our focus lies in addressing the run-time errors caused by mismatches between the data to be processed and the code responsible for its processing. We present strategies for the early detection of these mismatches through static and dynamic analysis, leveraging techniques from JSON Schema reasoning. Specifically, we showcase our recent contributions to essential tasks such as schema validation, schema extraction, and checking schema containment. Furthermore, we provide an outlook on the challenges introduced by the latest drafts of JSON Schema. Lastly, we conclude with a discussion on application domains for our contributions, extending beyond the fortification of machine learning pipelines against run-time errors.

About the Speaker

Stefanie Scherzinger is a Full Professor at the University of Passau, Germany.
She earned her Ph.D. from the University of Saarland in 2008. 

She then pursued an industry career, first working for IBM, then for Google.
Her real-world experiences shaped her perspective and laid the foundation for her academic endeavors.

In 2012, Stefanie Scherzinger returned to academia, assuming a professorship at OTH Regensburg, Germany.
Since early 2020, she has been chairing the "Scalable Database Systems" group at the University of Passau.

Her research is deeply influenced by her industry background, with a particular focus on maintaining database applications, especially those powered by NoSQL data stores. Stefanie Scherzinger is committed to the long-term maintainability of software systems and places great emphasis on the importance of achieving long-term reproducibility in computer science research.