Reranking Web Search Results for Diversity
Search engine results are often biased towards a certain aspect of a query or
towards a certain meaning for ambiguous query terms. Diversification of search
results offers a way to supply the user with a better balanced result set
increasing the probability that a user finds at least one document suiting her
information need. In this paper, we present a reranking approach based on
minimizing variance of Web search results to improve topic coverage in the
top-k results. We investigate two different document representations as the
basis for reranking. Smoothed language models and topic models derived by
Latent Dirichlet Allocation. To evaluate our approach we selected 240 queries
from Wikipedia disambiguation pages. This provides us with ambiguous queries
together with a community generated balanced representation of their
(sub)topics. For these queries we crawled two major commercial search engines.
In addition, we present a new evaluation strategy based on Kullback-Leibler
divergence and Wikipedia. We evaluate this method using the TREC sub-topic
evaluation on the one hand, and manually annotated query results on the other
Our results show that minimizing variance in search results by reranking
relevant pages significantly improves topic coverage in the top-k results with
respect to Wikipedia, and gives a good overview of the overall search result.
Moreover, latent topic models achieve competitive diversification with
significantly less reranking. Finally, our evaluation reveals that our
automatic evaluation strategy using Kullback-Leibler divergence correlates well
with alpha-nDCG scores used in manual evaluation efforts.