Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI
Login
 

Development of a Local Hierarchical Multi-label Classification Library (Wintersemester 2022/2023)

Lecturer: Prof. Dr. Bernhard Renard (Data Analytics and Computational Statistics) , Fabio Malcher Miranda (Data Analytics and Computational Statistics)

General Information

  • Weekly Hours: 4
  • Credits: 6
  • Graded: yes
  • Enrolment Deadline: 01.10.2022 - 31.10.2022
  • Teaching Form: Seminar
  • Enrolment Type: Compulsory Elective Module
  • Course Language: English
  • Maximum number of participants: 3

Programs, Module Groups & Modules

IT-Systems Engineering MA
Data Engineering MA
Digital Health MA
  • SCAD: Scalable Computing and Algorithms for Digital Health
    • HPI-SCAD-C Concepts and Methods
  • SCAD: Scalable Computing and Algorithms for Digital Health
    • HPI-SCAD-T Technologies and Tools
  • SCAD: Scalable Computing and Algorithms for Digital Health
    • HPI-SCAD-S Specialization
  • APAD: Acquisition, Processing and Analysis of Health Data
    • HPI-APAD-C Concepts and Methods
  • APAD: Acquisition, Processing and Analysis of Health Data
    • HPI-APAD-T Technologies and Tools
  • APAD: Acquisition, Processing and Analysis of Health Data
    • HPI-APAD-S Specialization
Cybersecurity MA
Software Systems Engineering MA

Description

While commonly machine learning models make a single prediction, some predictive tasks require the assignment of more than one label for each instance. Furthermore, for some of these problems the labels have a hierarchy, i.e., they are structured in the shape of trees or directed acyclic graphs.  For  this specific class of problems, as far as we know, there is not a library with generic implementations of these algorithms available for users to quickly train and evaluate multi-label local hierarchical models on different application domains.

Therefore, within this seminar we aim to extend the local hierarchical classification library HiClass [1] to add support for multi-label classification, using real data for evaluation.

Learning objectives:

  • You will improve your skills to develop efficient code;
  • You will practice the test driven development technique;
  • You will learn to plan and implement generic algorithms for hierarchical machine learning tasks;
  • You will deepen teamwork skills;
  • You will acquire the ability to contribute to open-source projects.

Requirements

You should have some previous experience programming in Python, familiarity with test driven development and other software engineering best practices as well as machine learning methods and libraries, in particular scikit-learn. Good knowledge of English is required to understand and discuss the assignments.

Literature

  1. Miranda, F. M., Köhnecke, N., and Renard, B. Y. (2021). HiClass: a Python library for local hierarchical classification compatible with scikit-learn. arXiv preprint arXiv:2112.06560.
  2. Silla, C.N. and Freitas, A.A., 2011. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1), pp.31-72.
  3. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. and Vanderplas, J., 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, pp.2825-2830.

Learning

  • Seminar for master students
  • Language of instruction: English
  • Maximum number of participants: 3

Meetings will happen in room K-1.03. Visual aids will be used when appropriate, in order to facilitate the discussion. Assignment details will be handed during lectures.

­­­­­All interested participants must enroll by October 24 via email to fabio.malchermiranda(at)hpi.de or attend the first meeting.

It is possible to unenroll without consequences until 30.11.2022.

Examination

Grading is based on multiple factors concerning the project success, including:

  • Written report  (80% overall, structure 10%, code quality 40% and documentation 30%);
  • A presentation due at the end of the semester (20%).

We will provide close support and regular feedback.

Dates

The first session will be on 24.10.2022, from 09:15 - 10:45. If necessary, the following sessions can be changed according to students' needs.

Zurück