The past years have been marked by several breakthrough results in the domain of generative AI, culminating in the rise of tools like ChatGPT, able to solve a variety of language-related tasks without specialized training. However, this power comes at a cost, making it challenging to scale up processing with LLMs to large data sets. In this talk, I outline several techniques we used successfully to make LLM inference more efficient. In particular, I discuss ThalamusDB, a system that performs approximate data processing, applying LLMs to carefully selected data subsets for maximal precision. I will also introduce our work on SpareLLM, aimed at helping users select the cheapest model that satisfies user-specified quality constraints when applied, for a given task, to a large data set. Beyond our work on making LLM usage more efficient, I will also touch upon several new use cases for LLMs in data management.