Over the past few years, retrieval-augmented generation (RAG) and LLM-based agents have rapidly evolved from simple document retrievers to complex architectures capable of orchestrating end-to-end data pipelines. This progress has opened the door to querying data in ways that blur the line between search, question answering, and database-like operations. In this talk, we cover some of the recent systems that incorporate optimization layers, dynamic retrieval, or reasoning strategies to query unstructured data. Yet, when moving from text-only corpora to multimodal datasets—where information may be stored as images, tables, or structured metadata—the challenges multiply: join-like reasoning across modalities, scaling over high-cardinality queries, and balancing precision with recall under context limits. To address these challenges, we present our vision for an analytical system over unstructured data. We highlight the open challenges for the design of such system and also in the often overlooked problem of benchmarking, presenting some of the most recent efforts on evaluating such systems on real-world pipelines data.