Hasso-Plattner-Institut
Prof. Dr. Emmanuel Müller
 

Course description

Graphs have experienced an explosion in the number of applications and have proved high versatility in different domains, such as biology, social analysis, climatology. A graph is a network of nodes and relationships between them (edges) and can be used to model behaviours of individuals or objects interacting one another. Due to the large interest in graphs many methods have appeared to retrieve non trivial information from them. As a parallel with traditional data (text) graph mining studies ways to extract patterns, infer behaviours, retrieve rules, querying, predict links, finding communities and many other things. 

The course is intended to provide an introduction to the broad topic of graph mining, with a focus on challenges, algorithmic solutions and new problems. Broadly speaking the course will touch the following topics: 

  • Background concepts: probability theory/statistics, basic linear algebra, basic graph concepts (morphisms, degrees, matrix representation, ...)
  • Graph querying: exact and approximate - Reachability queries
  • Frequent subgraph mining
  • Graph indexing
  • Random walks
  • Graph clustering and anomaly detection
  • Link prediction
  • Summary of algorithms for different models (graph streams, evolving graphs, probabilistic graphs, colored graphs)
  • Some practical graph mining framework

The course has no prerequisites, although a basic knowledge in data mining, statistics and linear algebra would be beneficial. All the basic and advanced aspects will be covered, with details on state of the art algorithms and fundamentals of the discipline.

In addition, the student will be able at the end of the course of identify a number of problems related to real world graphs and acquire specific skills in the solution of graph problems.

News

  • Deadline for project reports: January 24 @ 6pm
  • The oral exam will be February 20March 6-7, Room: DE 9/10
  • November 1: We remind you that you MUST fill the form for the presentation. The link is in the slides from the last lecture. 

Course Schedule

DayTopicMaterialExtra
18.10Introduction to graph miningslides
25.10Social network analysis - Diffusionslidesnotation
01.11  Querying graphsslideslinear algebra
08.11Frequent subgraph miningslides
15.11Graph indexing slides
22.11Node similarity and classificationslides
29.11Some practical graph mining framework  - Project assignmentslides
06.12Link predictionslides
13.12Student paper presentation [first part]slides
20.12Christmas break
27.12Christmas break
03.01Non overlapping communities - first partslides
10.01Non overlapping communities - second partslidesextra
17.01Overlapping communitiesslides
24.01Report + Extra student presentationslides
31.01

Anomaly detection

Probabilistic graphs, Signed networks 

slides

slides

07.02Student paper presentation [second part] slides

Grading

The course marks will be distributed among an oral exam at the end, a paper presentation and a small pratical project with the following scheme. 

  • 10% Practical project and report
  • 20% Paper presentation during lecture time
  • 70% Oral exam

The oral exam is solely based on the material presented during the course EXCLUDING student presentations and extra material.

Assignments

During the course there will be two assignments: a presentation and a small project.

Presentation

The presentation will be on a paper, selected out of a list of papers provided during the course. The presentation will be either on the topics presented in the first half of the course or the second. Every student has to read the paper, understand it, and prepare slides to be presented in class in one of the two slots, depending on the topic.

The presentation time is 15 minutes divided in 12 minutes presentation + 3 minutes questions. 

The presentation is individual, therefore any student is asked to choose a paper and prepare the slides.

The list of papers is available here: https://goo.gl/YMR0wDMore information will be announced in the course. 

Project

The project will be related to a small analysis of a network with some network library or tool provided. More details on the project and (eventual) grouping will follow shortly. 

The project results has to be handed in a report of maximum 5 pages. Any overflow will be simply ignored.

Course Material

For this course there is no official book. All the material, including slides, references, and videos, will be posted in the official page or in the intranet. However, the following, incomplete, list, can be used as complementary material: 

  • Aggarwal, C.C. and Wang, H. eds., 2010.Managing and mining graph data (Vol. 40). New York: Springer.
  • Chakrabarti, D. and Faloutsos, C., 2012. Graph mining: laws, tools, and case studies. Synthesis Lectures on Data Mining and Knowledge Discovery7(1), pp.1-207.
  • Easley, D. and Kleinberg, J., 2010. Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge University Press.
  • Rehman, S.U., Khan, A.U. and Fong, S., 2012, August. Graph mining: A survey of graph mining techniques. In Digital Information Management (ICDIM), 2012 Seventh International Conference on (pp. 88-92). IEEE.

Additional material