Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI
Login
 

Dissecting the complex - efficient computing in large networks (Sommersemester 2021)

Dozent: Prof. Dr. Bernhard Renard (Data Analytics and Computational Statistics) , Dr. Katharina Baum (Data Analytics and Computational Statistics) , Dr. Athar Khodabakhsh (Data Analytics and Computational Statistics)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 18.03.2021 - 09.04.2021
  • Lehrform: Seminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 8

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
Data Engineering MA
  • DATA: Data Analytics
    • HPI-DATA-K Konzepte und Methoden
  • DATA: Data Analytics
    • HPI-DATA-T Techniken und Werkzeuge
  • DATA: Data Analytics
    • HPI-DATA-S Spezialisierung
  • PREP: Data Preparation
    • HPI-PREP-T Techniken und Werkzeuge
  • PREP: Data Preparation
    • HPI-PREP-K Konzepte und Methoden
  • PREP: Data Preparation
    • HPI-PREP-S Spezialisierung
  • CODS: Complex Data Systems
    • HPI-CODS-K Konzepte und Methoden
  • CODS: Complex Data Systems
    • HPI-CODS-T Techniken und Werkzeuge
  • CODS: Complex Data Systems
    • HPI-CODS-S Spezialisierung
Cybersecurity MA
Digital Health MA

Beschreibung

The world is complex, and so is its data. A large part of its complexity stems from existing relationships between entities that require a representation of data in networks (graphs). How to computationally analyze or characterize networks, especially large networks, is still a challenging problem.

In this seminar, you will practically analyze data in large, rather dense networks with >100k nodes. Data for the networks will stem from real-world applications. They can range from tracing transmissions in Covid19 infections, to investigating Amazon’s trading, finding bottlenecks in traffic routes, or detecting redundancies in networks of biological molecules, as well as other areas, according to your interest.

The goal is to implement recent computational methods for network analysis. You will apply different methods and compare runtime, memory consumption and feasibility of different methods in our hands and for other datasets than they have been published with. Potential analysis methods we could explore are , for example, vertex-centric strategies (Google’s web search ranking uses one of those!), random-walks or node embeddings.

We will first introduce or recapitulate some basics on graphs and networks and how data can be analyzed in a network context. You will then choose a specific network analysis method to implement from a pool of recent research papers (or a similar paper of your own choice, on request and if appropriate). You can then decide on an appropriate dataset that is interesting to you and fulfils some criteria on network size and complexity, and apply the method of your choice to (sub-)networks of different sizes. Thereby, you will assess the method’s utility for the task at hand or additional network analysis tasks, and potentially compare its performance to other methods.

Learning Objectives:

  • You will learn methods for analyzing complex data in a network format.
  • You will obtain a feeling for computational difficulties and challenges that arise when dealing with complex, large-scale real-world data in a network context.
  • You will learn to practically assess optimal tools for network analysis and experience their limitations.
  • Your ability to critically interact with research publications and to find and consult secondary literature will be trained.
  • You will train how to organize work in a small group of two, how to present and visualize your results, orally and written.

Voraussetzungen

·       Good programming knowledge in Python or another programming language is absolutely required. You should be able to – independently – re-implement and apply a rather complex method following the description in a research publication and establish running, well-documented code for data preprocessing and analysis.

·       Due to the data size we plan to analyze, first practical experiences working with a high-performance computing cluster will be beneficial.

·       You should have good command of English in written as well as orally. (The lecture will be given in English, but you can ask questions in German and submit German solutions etc.)

·       Basic knowledge of graphs and network analysis is beneficial, but it will be possible to refresh/learn them on the way.                                          

Literatur

  1. Lyu T, Bing L, Zhang Z, Zhang Y. FOX: Fast Overlapping Community Detection Algorithm in Big Weighted Networks. Trans Soc Comput. 2020;3(3):Article 16. doi: 10.1145/3404970.
     
  2. Capelli LAR, Hu Z, Zakian TAK, Brown N, Bull JM. iPregel: Vertex-centric programmability vs memory efficiency and performance, why choose? Parallel Computing. 2019;86:45-56. doi: https://doi.org/10.1016/j.parco.2019.04.005.
     
  3. Grover A, Leskovec J. node2vec: Scalable Feature Learning for Networks.  Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Francisco, California, USA: Association for Computing Machinery; 2016. p. 855–64.

Lern- und Lehrformen

The majority of the seminar will consist of hands-on project work that includes programming, data preparation and analysis, and reporting; you are allowed to work in pairs on your project.

First meetings will be held in a lecture-like format to recapitulate relevant basics of graph and network analysis, introduce some example datasets as well as data analysis with networks. Subsequent regular, weekly meetings will serve for short updates on your project status and will have a highly interactive character. Additional meetings for in-depth discussion of specific problems within a project are possible on demand. During the last meetings, you will present your project and its results in a talk, and you will be asked to hand in a written report as well as your documented code.

­­­­­Depending on the pandemic situation and preferences of the students the course will be offered online (meetings via zoom) or onsite.

Please subscribe to the seminar’s moodle that we will use to share dial-in and other relevant information:

https://moodle.hpi.de/course/view.php?id=150.

Leistungserfassung

You will be asked to participate in the regular progress meetings and write a short project proposal outlining the approach including the method you want to implement, the dataset and the type of network analysis. In the end, you will give a talk on your project and hand in your documented code and a final written report. The final grade will be derived by:

  1. Project proposal (20%)
  2. Oral presentation of the final results of your project (30%)
  3. Quality of the documented code and written final report (50%)

Termine

Mondays 13:30-15:00, starting from April 12. We can shift the meetings according to the needs of the participants.

Zurück