Hasso-Plattner-Institut
  
Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Description

In this seminar we will start a distributed computation project from scratch. The first step is installing and configuring Hadoop on numerous commodity PCs. We will than explore the functionality and tuning of the distributed file system as a basis for distributed computing. The main part of the seminar is the design of distributed algorithms based on map/reduce. You will work in 2-people-teams for designing, implementing, running and performance-tuning your solutions.

Organization

  • 12 participants
  • 6 topics
  • participants rate every topic
  • topics will be assigned according to the ratings
  • interested students are required to attend the first meeting
  • Supervisors: Alexander Albrecht, Christoph Böhm
  • seminar will be held in German
  • Date: Monday 15:15 – 16:45, A 2-1

The first organizational meeting will be on Monday 20.04.2009 in A 2-1.

Schedule

20.04.2009
27.04.2009
04.05.2009
11.05.2009
18.05.2009
25.05.2009presentation of intermediate results
01.06.2009Pfingstmontag
08.06.2009
15.06.2009Talk about ongoing research
Fabian Hueske, Stephan Ewen (TU Berlin)
22.06.2009
25.06.200917:00: 5th Apache Hadoop Get Together @ Berlin
29.06.2009
06.07.2009

Final Demos & Presentations I

Similarity Join (Andrina Mascher & Tim Felgentreff)
TF/IDF (Florian Thomas & Christian Ress)

13.07.2009

Final Demos & Presentations II

Association Rules (Cindy Faehnrich & Jossekin Beilharz)
Phrase Subsumption (Philipp Berger & Thomas Zimmermann)

20.07.2009

Final Demos & Presentations III

Hadoop Scripting (Konstantin Haase & Johan Uhle)
Clustering (Robert Pfeiffer & Tobias Schmidt)

Requirements

  • You are expected to show up in all sessions.
  • You have to design and implement a map/reduce solution in Java.
  • Give a talk (incl. demo) about your topic. You have 30 minutes to explain the topic to your fellow students, who will invest the next 15 minutes to discuss and comment on the topic and the talk.
  • Submit a report (5 pages) on your assigned topic. The report should discuss (not summerize) the assigned work, showing its strengths and weaknesses, your suggestions and comments ...  
  • Your final grade is affected by your talk, your understanding of the topic (answering the challenging questions!), your report, your participation in the discussion and asking questions and your attendance.

References

http://labs.google.com/papers/mapreduce.html
http://hadoop.apache.org/