# Course description

Graphs have experienced an explosion in the number of applications and have proved high versatility in different domains, such as biology, social analysis, climatology. A graph is a network of nodes and relationships between them (edges) and can be used to model behaviours of individuals or objects interacting one another. Due to the large interest in graphs many methods have appeared to retrieve non trivial information from them. As a parallel with traditional data (text) graph mining studies ways to extract patterns, infer behaviours, retrieve rules, querying, predict links, finding communities and many other things.

The course is intended to provide an introduction to the broad topic of graph mining, with a focus on challenges, algorithmic solutions and new problems. Broadly speaking the course will touch the following topics:

- Background concepts: probability theory/statistics, basic linear algebra, basic graph concepts (morphisms, degrees, matrix representation, ...)
- Graph querying: exact and approximate - Reachability queries
- Frequent subgraph mining
- Graph indexing
- Random walks
- Graph clustering and anomaly detection
- Link prediction
- Summary of algorithms for different models (graph streams, evolving graphs, probabilistic graphs, colored graphs)
- Some practical graph mining framework

The course has no prerequisites, although a basic knowledge in data mining, statistics and linear algebra would be beneficial. All the basic and advanced aspects will be covered, with details on state of the art algorithms and fundamentals of the discipline.

In addition, the student will be able at the end of the course of identify a number of problems related to real world graphs and acquire specific skills in the solution of graph problems.

## News

- Deadline for project reports:
**January 24 @ 6pm** - The oral exam will be
**February 20**,**March 6-7, Room: DE 9/10** **November 1:**We remind you that you MUST fill the form for the presentation. The link is in the slides from the last lecture.

# Course Schedule

Day | Topic | Material | Extra |
---|---|---|---|

18.10 | Introduction to graph mining | slides | |

25.10 | Social network analysis - Diffusion | slides | notation |

01.11 | Querying graphs | slides | linear algebra |

08.11 | Frequent subgraph mining | slides | |

15.11 | Graph indexing | slides | |

22.11 | Node similarity and classification | slides | |

29.11 | Some practical graph mining framework - Project assignment | slides | |

06.12 | Link prediction | slides | |

13.12 | Student paper presentation [first part] | slides | |

20.12 | Christmas break | ||

27.12 | Christmas break | ||

03.01 | Non overlapping communities - first part | slides | |

10.01 | Non overlapping communities - second part | slides | extra |

17.01 | Overlapping communities | slides | |

24.01 | Report + Extra student presentation | slides | |

31.01 | Anomaly detection Probabilistic graphs, Signed networks | ||

07.02 | Student paper presentation [second part] | slides |

# Grading

The course marks will be distributed among an oral exam at the end, a paper presentation and a small pratical project with the following scheme.

- 10% Practical project and report
- 20% Paper presentation during lecture time
- 70% Oral exam

The oral exam is **solely **based on the material presented during the course **EXCLUDING** student presentations and extra material.

# Assignments

During the course there will be two assignments: a presentation and a small project.

## Presentation

The presentation will be on a paper, selected out of a list of papers provided during the course. The presentation will be either on the topics presented in the first half of the course or the second. Every student has to read the paper, understand it, and prepare slides to be presented in class in one of the two slots, depending on the topic.

The presentation time is 15 minutes divided in 12 minutes presentation + 3 minutes questions.

The presentation is **individual**, therefore any student is asked to choose a paper and prepare the slides.

The list of papers is available here: https://goo.gl/YMR0wD. More information will be announced in the course.

## Project

The project will be related to a small analysis of a network with some network library or tool provided. More details on the project and (eventual) grouping will follow shortly.

The project results has to be handed in a report of maximum **5 pages**. Any overflow will be simply ignored.

# Course Material

For this course there is no official book. All the material, including slides, references, and videos, will be posted in the official page or in the intranet. However, the following, incomplete, list, can be used as complementary material:

- Aggarwal, C.C. and Wang, H. eds., 2010.
*Managing and mining graph data*(Vol. 40). New York: Springer. - Chakrabarti, D. and Faloutsos, C., 2012. Graph mining: laws, tools, and case studies.
*Synthesis Lectures on Data Mining and Knowledge Discovery*,*7*(1), pp.1-207. - Easley, D. and Kleinberg, J., 2010.
*Networks, crowds, and markets: Reasoning about a highly connected world*. Cambridge University Press. - Rehman, S.U., Khan, A.U. and Fong, S., 2012, August. Graph mining: A survey of graph mining techniques. In
*Digital Information Management (ICDIM), 2012 Seventh International Conference on*(pp. 88-92). IEEE.

## Additional material

- Link prediction survey: https://arxiv.org/pdf/1010.0725v1
- Community detection survey: https://arxiv.org/pdf/0906.0612.pdf