Abstract : Summarizing a large dataset with a reduced-size data synopsis has applications from database query optimization to approximate query processing. Increasingly, data synopsis approaches leverage the inherent compression properties of machine learning (ML) models to achieve state-of-the-art results. This talk will deconstruct this trend to understand the key mechanisms behind machine learning's recent success in a historically well-established area of research. (The Good) I present a series of results that suggest ML models are astonishingly accurate at many different types of high-dimensional data summarization. (The Bad) I show that in "medium-dimensional" regimes it is possible to design new classical data synopsis techniques that meet of exceed the performance of ML models. (The Ugly) I discuss the under-appreciated reliability-gap between ML models and classical data summarization techniques.
Bio: Sanjay Krishnan is an Assistant Professor of Computer Science at the University of Chicago. His research studies the intersection of machine learning and database systems. Sanjay completed his PhD and Master’s Degree at UC Berkeley in Computer Science in 2018. Sanjay's work has received a number of awards including the 2016 SIGMOD Best Demonstration award, 2015 IEEE GHTC Best Paper award, and Sage Scholar award.
Research webpage: http://sanjayk.io/?src=%2F~skr%2F