We offer 4 topics that are well-described in .
Calculate PageRank on a cluster efficiently (Chapter 5.2) and implement one extension countering link spam (either TrustRank (5.4.4) or SpamMass (5.4.5)).
Clustering in Non-Euclidian Space
Clustering groups similar items according to a distance measure. Chapter 7.5 introduces clustering in non-euclidian space and 7.6.6 outlines briefly a parallel implementation.
Frequent itemsets (Chapter 6) represent often co-occurring items in a large data set, e.g. books that are regularly bought together at Amazon. The SON algorithm can be well parallelized with MapReduce as described in Chapter 6.4.4.
Collaborative filtering is a technique to recommend items to users using a large knowledge base of previous user-item relations, e.g., purchase or ratings. Chapter 9 covers recommendation systems in general; a parallel implementation is the parallel stochastic gradient descent.