Projects of the Spatial Data Analysis and Large Scale Data Processing Group
Current Projects
Semantic Multi-Modal Database Operators
Semantic filters powered by large vision-language models (LVLMs) enable users to query image databases and data lakes using natural language predicates. For example, a user can filter marketplace listings for "suspicious offers" without explicit rules. However, LVLM inference is expensive: queries that run in milliseconds on structured data can take minutes on unstructured data such as large image collections.
We propose a multi-stage, cascading filtering pipeline using pre-computed embeddings to prune candidates before LVLM evaluation. Dynamic logical query-plan rewriting for embedding-based filtering allows the decomposition of complex queries into easy-to-classify sub-queries, increasing pruning power and, consequently, reducing the number of calls to the LVLM.