Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI
Login
 

Towards Generalizable Hierarchical Classification (Sommersemester 2023)

Dozent: Prof. Dr. Bernhard Renard (Data Analytics and Computational Statistics) , Fabio Malcher Miranda (Data Analytics and Computational Statistics)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 01.04.2023 - 07.05.2023
  • Lehrform: Seminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 3

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
Data Engineering MA
Digital Health MA
Cybersecurity MA
  • SECA: Security Analytics
    • HPI-SECA-K Konzepte und Methoden
  • SECA: Security Analytics
    • HPI-SECA-T Techniken und Werkzeuge
  • SECA: Security Analytics
    • HPI-SECA-S Spezialisierung
  • CYAD: Cyber Attack and Defense
    • HPI-CYAD-K Konzepte und Methoden
  • CYAD: Cyber Attack and Defense
    • HPI-CYAD-T Techniken und Werkzeuge
  • CYAD: Cyber Attack and Defense
    • HPI-CYAD-S Spezialisierung
Software Systems Engineering MA

Beschreibung

While commonly machine learning models make a single prediction, some predictive tasks require the assignment of more than one label for each instance. Furthermore, for some of these problems the labels have a hierarchy, i.e., they are structured in the shape of trees or directed acyclic graphs.  For  this specific class of problems, we have developed a library called HiClass [1] with generic implementations of local hierarchical classification algorithms, which enables users to quickly train and evaluate hierarchical models on different application domains. In this seminar, we aim to extend this library to add support for explainability, improved prediction probabilities, and incremental/out-of-core learning.

Learning objectives:

  • You will improve your skills to develop efficient code;
  • You will practice the test driven development technique;
  • You will learn to plan and implement generalizable algorithms for hierarchical machine learning tasks;
  • You will deepen teamwork skills;
  • You will acquire the ability to contribute to open-source projects.

Voraussetzungen

You should have some previous experience programming in Python, familiarity or willingness to learn test driven development and other software engineering best practices as well as machine learning methods and libraries, in particular scikit-learn. Good knowledge of English is required to understand and discuss the assignments.

Literatur

  1. Miranda, F. M., Köhnecke, N., & Renard, B. Y. (2023). HiClass: a python library for local hierarchical classification compatible with scikit-learn. Journal of Machine Learning Research, 24(29), 1-17.
  2. Silla, C.N. and Freitas, A.A., 2011. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1), pp.31-72.
  3. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. and Vanderplas, J., 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, pp.2825-2830.
  4. Molnar, C. (2020). Interpretable machine learning. https://christophm.github.io/interpretable-ml-book/.
  5. McCallum, A., Rosenfeld, R., Mitchell, T. M., & Ng, A. Y. (1998, July). Improving Text Classification by Shrinkage in a Hierarchy of Classes. In ICML (Vol. 98, pp. 359-367).

Punera, K., & Ghosh, J. (2008, April). Enhanced hierarchical classification via isotonic smoothing. In Proceedings of the 17th international conference on World Wide Web (pp. 151-160).

Lern- und Lehrformen

  • Seminar for master students
  • Language of instruction: English
  • Maximum number of participants: 3

Meetings will happen in room K-1.03. Visual aids will be used when appropriate, in order to facilitate the discussion. Assignment details will be handed during lectures.

­­­­­All interested participants must enroll by April 30 via email to fabio.malchermiranda(at)hpi.de or attend the first meeting.

It is possible to unenroll without consequences until 07.05.2023.

Leistungserfassung

Grading is based on multiple factors concerning the project success, including:

  • Written report  (80% overall, structure 10%, code quality 40% and documentation 30%);
  • A presentation due at the end of the semester (20%).

We will provide close support and regular feedback

Termine

The first session will be on 18.04.2023, from 17:00 - 18:30. If necessary, the following sessions can be changed according to students' needs.

Zurück