SQL is and has been the standard language to query relational databases for decades. The DBMSs are highly optimized for storing and querying large amounts of data, but complex analysis tasks are often difficult or even impossible to express in SQL. For data science and analytics tasks other languages and libraries, such as Python and Pandas, have become increasingly popular. However, as these Python scripts are executed on client PC with much weaker hardware than the database server, a data scientist has to care about buffer management for larger-than-RAM datasets and parallelism for faster execution - problems that are already solved by the DBMS.
In this talk we present Grizzly, an approach to execute operations on DataFrames inside a database system and highlight challenges and opportunities for modern data analytics tasks. Grizzly produces SQL queries for operations on DataFrames, moving complexity from workstations to database servers and allows to not only access data already stored in a database, but also to combine it with external data from files, execute user-defined functions as well as to peform a "model join" to easily apply pre-trained machine learning models to data -- all inside the database system.