Projects of the Spatial Data Analysis and Large Scale Data Processing Group

Current Projects

Semantic Multi-Modal Database Operators

Semantic filters powered by large vision-language models (LVLMs) enable users to query image databases and data lakes using natural language predicates. For example, a user can filter marketplace listings for "suspicious offers" without explicit rules. However, LVLM inference is expensive: queries that run in milliseconds on structured data can take minutes on unstructured data such as large image collections.

We propose a multi-stage, cascading filtering pipeline using pre-computed embeddings to prune candidates before LVLM evaluation. Dynamic logical query-plan rewriting for embedding-based filtering allows the decomposition of complex queries into easy-to-classify sub-queries, increasing pruning power and, consequently, reducing the number of calls to the LVLM.