Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Description

In this seminar, the students will learn data mining techniques for discovering frequent itemsets. The discovery of frequent itemsets is a very useful technique for analyzing data, generating association rules, deriving machine learning features and  many other applications. 

We expect the students to examine existing techniques by implementing (and possibly improving) one approach and an extension of that approach (for instance "multisets" or "utility patterns"). You should develop a suitable use case for that extension of the chosen frequent pattern analysis algorithm. The students are free to use any data set for their use case. We regard the DBpedia Infobox triples as a promising option. At the end of the seminar, the students are asked to evaluate their implemented algorithm on their use case both quantitatively and qualitatively.

The maximum number of students is 6, resulting in 3 teams.

Important Dates

■ May 15th: intermediate presentation 

■ July 10th: final presentations in room A-1.1

■ August 31st: short paper deadline

Slides

DateTopicSlides
April 10th 2012Introduction and Organizationpdf
May 15th 2012

AprioriTID

Eclat

FP-Growth

Relim

pdf

pdf

pdf

pdf

Juli 10th 2012

Parallel AprioriTID

MaxClique vs Eclat

Incremental FP-Growth

Fuzzy Relim

pdf

pdf

pdf

pdf

Grading Process

3 LP

Paper presentation and final presentation

Participation during presentations / discussions

Implementation of strategies/proposed extensions

6-page evaluation report

Literature

R. Agrawal & R. Srikant
Fast Algorithms for Mining Association Rules
VLDB '94 

J. Han & J. Pei & Y. Yin
Mining Frequent Patterns without Candidate Generation
SIGMOD '00

M. J. Zaki
Scalable Algorithms for Association Mining 
TKDE '00