Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI
 

Towards Generalizable Hierarchical Classification (Sommersemester 2023)

Lecturer: Prof. Dr. Bernhard Renard (Data Analytics and Computational Statistics) , Fabio Malcher Miranda (Data Analytics and Computational Statistics)

General Information

  • Weekly Hours: 4
  • Credits: 6
  • Graded: yes
  • Enrolment Deadline: 01.04.2023 - 07.05.2023
  • Teaching Form: Seminar
  • Enrolment Type: Compulsory Elective Module
  • Course Language: English
  • Maximum number of participants: 3

Programs, Module Groups & Modules

IT-Systems Engineering MA
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-K Konzepte und Methoden
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-T Techniken und Werkzeuge
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-S Spezialisierung
  • SAMT: Software Architecture & Modeling Technology
    • HPI-SAMT-K Konzepte und Methoden
  • SAMT: Software Architecture & Modeling Technology
    • HPI-SAMT-T Techniken und Werkzeuge
  • SAMT: Software Architecture & Modeling Technology
    • HPI-SAMT-S Spezialisierung
Data Engineering MA
Digital Health MA
Cybersecurity MA
Software Systems Engineering MA

Description

While commonly machine learning models make a single prediction, some predictive tasks require the assignment of more than one label for each instance. Furthermore, for some of these problems the labels have a hierarchy, i.e., they are structured in the shape of trees or directed acyclic graphs.  For  this specific class of problems, we have developed a library called HiClass [1] with generic implementations of local hierarchical classification algorithms, which enables users to quickly train and evaluate hierarchical models on different application domains. In this seminar, we aim to extend this library to add support for explainability, improved prediction probabilities, and incremental/out-of-core learning.

Learning objectives:

  • You will improve your skills to develop efficient code;
  • You will practice the test driven development technique;
  • You will learn to plan and implement generalizable algorithms for hierarchical machine learning tasks;
  • You will deepen teamwork skills;
  • You will acquire the ability to contribute to open-source projects.

Requirements

You should have some previous experience programming in Python, familiarity or willingness to learn test driven development and other software engineering best practices as well as machine learning methods and libraries, in particular scikit-learn. Good knowledge of English is required to understand and discuss the assignments.

Literature

  1. Miranda, F. M., Köhnecke, N., & Renard, B. Y. (2023). HiClass: a python library for local hierarchical classification compatible with scikit-learn. Journal of Machine Learning Research, 24(29), 1-17.
  2. Silla, C.N. and Freitas, A.A., 2011. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1), pp.31-72.
  3. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. and Vanderplas, J., 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, pp.2825-2830.
  4. Molnar, C. (2020). Interpretable machine learning. https://christophm.github.io/interpretable-ml-book/.
  5. McCallum, A., Rosenfeld, R., Mitchell, T. M., & Ng, A. Y. (1998, July). Improving Text Classification by Shrinkage in a Hierarchy of Classes. In ICML (Vol. 98, pp. 359-367).

Punera, K., & Ghosh, J. (2008, April). Enhanced hierarchical classification via isotonic smoothing. In Proceedings of the 17th international conference on World Wide Web (pp. 151-160).

Learning

  • Seminar for master students
  • Language of instruction: English
  • Maximum number of participants: 3

Meetings will happen in room K-1.03. Visual aids will be used when appropriate, in order to facilitate the discussion. Assignment details will be handed during lectures.

­­­­­All interested participants must enroll by April 30 via email to fabio.malchermiranda(at)hpi.de or attend the first meeting.

It is possible to unenroll without consequences until 07.05.2023.

Examination

Grading is based on multiple factors concerning the project success, including:

  • Written report  (80% overall, structure 10%, code quality 40% and documentation 30%);
  • A presentation due at the end of the semester (20%).

We will provide close support and regular feedback

Dates

The first session will be on 18.04.2023, from 17:00 - 18:30. If necessary, the following sessions can be changed according to students' needs.

Zurück