Sketching Algorithms for Approximating Rank Correlations in Collaborative Filtering Systems. Bachrach, Yoram; Herbrich, Ralf; Porat, Ely (2009). 344–352.
Collaborative filtering (CF) shares information between users to provide each with recommendations. Previous work suggests using sketching techniques to handle massive data sets in CF systems, but only allows testing whether users have a high proportion of items they have both ranked. We show how to determine the correlation between the rankings of two users, using concise sketches of the rankings. The sketches allow approximating Kendall's Tau, a known rank correlation, with high accuracy \($\epsilon$\) and high confidence \($1-\delta$\). The required sketch size is logarithmic in the confidence and polynomial in the accuracy.
Novel Tools to Streamline the Conference Review Process: Experiences from SIGKDD’09. Flach, Peter; Spiegler, Sebastian; Golenia, Bruno; Price, Simon; Guiver, John; Herbrich, Ralf; Graepel, Thore; Zaki, Mohammed in SIGKDD Explorations (2009). 11(2) 63–67.
The SIGKDD'09 Research Track received 537 paper submissions, which were reviewed by a Program Committee of 199 members, and a Senior Program Committee of 22 members. We used techniques from artificial intelligence and data mining to streamline and support this complicated process at three crucial stages: bidding by PC members on papers, assigning papers to reviewers, and calibrating scores obtained from the reviews. In this paper we report on the approaches taken, evaluate how well they worked, and describe some further work done after the conference.
Scalable Clustering and Keyword Suggestion for Online Advertisements. Schwaighofer, Anton; Candela, Joaquin Qui nonero; Borchert, Thomas; Graepel, Thore; Herbrich, Ralf (2009). 27–36.
We present an efficient Bayesian online learning algorithm for clustering vectors of binary values based on a well known model, the mixture of Bernoulli profiles. The model includes conjugate Beta priors over the success probabilities and maintains discrete probability distributions for cluster assignments. Clustering is then formulated as inference in a factor graph which is solved efficiently using online approximate message passing. The resulting algorithm has three key features: a) it requires only a single pass across the data and can hence be used on data streams, b) it maintains the uncertainty of parameters and cluster assignments, and c) it implements an automatic step size adaptation based on the current model uncertainty. The model is tested on an artificially generated toy dataset and applied to a large scale real-world data set from online advertising, the data being online ads characterized by the set of keywords to which they have been subscribed. The proposed approach scales well for large datasets, and compares favorably to other clustering algorithms on the ads dataset. As a concrete application to online advertising we show how the learned model can be used to recommend new keywords for given ads.
Matchbox: Large Scale Online Bayesian Recommendations. Stern, David; Herbrich, Ralf; Graepel, Thore (2009). 111–120.
We present a probabilistic model for generating personalised recommendations of items to users of a web service. The Matchbox system makes use of content information in the form of user and item meta data in combination with collaborative filtering information from previous user behavior in order to predict the value of an item for a user. Users and items are represented by feature vectors which are mapped into a low-dimensional trait space in which similarity is measured in terms of inner products. The model can be trained from different types of feedback in order to learn user-item preferences. Here we present three alternatives: direct observation of an absolute rating each user gives to some items, observation of a binary preference (like/ don't like) and observation of a set of ordinal ratings on a userspecific scale. Efficient inference is achieved by approximate message passing involving a combination of Expectation Propagation (EP) and Variational Message Passing. We also include a dynamics model which allows an item's popularity, a user's taste or a user's personal rating scale to drift over time. By using Assumed-Density Filtering (ADF) for training, the model requires only a single pass through the training data. This is an on-line learning algorithm capable of incrementally taking account of new data so the system can immediately reflect the latest user preferences. We evaluate the performance of the algorithm on the MovieLens and Netflix data sets consisting of approximately 1,000,000 and 100,000,000 ratings respectively. This demonstrates that training the model using the on-line ADF approach yields state-of-the-art performance with the option of improving performance further if computational resources are available by performing multiple EP passes over the training data.