Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
 

Gerardo Vitagliano

Affiliation: MIT CSAIL
Title: Agentic data systems and Multimodal Analytics

 

Abstract

Over the past few years, retrieval-augmented generation (RAG) and LLM-based agents have rapidly evolved from simple document retrievers to complex architectures capable of orchestrating end-to-end data pipelines. This progress has opened the door to querying data in ways that blur the line between search, question answering, and database-like operations. In this talk, we cover some of the recent systems that incorporate optimization layers, dynamic retrieval, or reasoning strategies to query unstructured data. Yet, when moving from text-only corpora to multimodal datasets—where information may be stored as images, tables, or structured metadata—the challenges multiply: join-like reasoning across modalities, scaling over high-cardinality queries, and balancing precision with recall under context limits. To address these challenges, we present our vision for an analytical system over unstructured data. We highlight the open challenges for the design of such system and also in the often overlooked problem of benchmarking, presenting some of the most recent efforts on evaluating such systems on real-world pipelines data.

Short CV

Gerardo Vitagliano is a postdoctoral associate at the Data Systems Group of MIT CSAIL. He completed his doctoral studies at the Hasso Plattner Institute, with a thesis on structural representations of tabular data files. His main research goal is making data accessible to everyone: this includes an interests in multimodal data integration, optimizing the performance of LLM and agent-based systems, and assisting non-technical users in the design and deployment of complex data pipelines.