In this course we will develop large scale data mining techniques and research prototypes. Each team, consisting of three students, must identify a challenging BigData problem and solve the problem using a distributed computing framework. Distributed computing, especially with the Map/Reduce framework Apache Hadoop, is already widely adopted. In this seminar, each group will apply one of the two more recent frameworks Apache Spark or Apache Flink to solve their challenge. Students will have access to the Amazon EC2 computing cluster. Hence, we will be able to work with only a small number of students, and enrollment will be limited.
This is a project course. There will be only a few weekly lectures, and only one or two introductory lectures. We will spend the quarter working in teams on different large scale data mining related projects. Teams will frequently meet with the assigned mentor.