Development of a Local Hierarchical Multi-label Classification Library (Wintersemester 2022/2023)
Lecturer:
Prof. Dr. Bernhard Renard
(Data Analytics and Computational Statistics)
,
Fabio Malcher Miranda
(Data Analytics and Computational Statistics)
General Information
- Weekly Hours: 4
- Credits: 6
- Graded:
yes
- Enrolment Deadline: 01.10.2022 - 31.10.2022
- Teaching Form: Seminar
- Enrolment Type: Compulsory Elective Module
- Course Language: English
- Maximum number of participants: 3
Programs, Module Groups & Modules
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung
- SAMT: Software Architecture & Modeling Technology
- HPI-SAMT-K Konzepte und Methoden
- SAMT: Software Architecture & Modeling Technology
- HPI-SAMT-S Spezialisierung
- SAMT: Software Architecture & Modeling Technology
- HPI-SAMT-T Techniken und Werkzeuge
- DANA: Data Analytics
- HPI-DANA-K Konzepte und Methoden
- DANA: Data Analytics
- HPI-DANA-T Techniken und Werkzeuge
- DANA: Data Analytics
- HPI-DANA-S Spezialisierung
- CODS: Complex Data Systems
- HPI-CODS-K Konzepte und Methoden
- CODS: Complex Data Systems
- HPI-CODS-T Techniken und Werkzeuge
- CODS: Complex Data Systems
- HPI-CODS-S Spezialisierung
- SCAD: Scalable Computing and Algorithms for Digital Health
- HPI-SCAD-C Concepts and Methods
- SCAD: Scalable Computing and Algorithms for Digital Health
- HPI-SCAD-T Technologies and Tools
- SCAD: Scalable Computing and Algorithms for Digital Health
- HPI-SCAD-S Specialization
- APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-C Concepts and Methods
- APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-T Technologies and Tools
- APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-S Specialization
- SECA: Security Analytics
- HPI-SECA-K Konzepte und Methoden
- SECA: Security Analytics
- HPI-SECA-T Techniken und Werkzeuge
- SECA: Security Analytics
- HPI-SECA-S Spezialisierung
- CYAD: Cyber Attack and Defense
- HPI-CYAD-K Konzepte und Methoden
- CYAD: Cyber Attack and Defense
- HPI-CYAD-T Techniken und Werkzeuge
- CYAD: Cyber Attack and Defense
- HPI-CYAD-S Spezialisierung
- MALA: Machine Learning and Analytics
- HPI-MALA-C Concepts and Methods
- MALA: Machine Learning and Analytics
- HPI-MALA-T Technologies and Tools
- MALA: Machine Learning and Analytics
- HPI-MALA-S Specialization
- MODA: Models and Algorithms
- HPI-MODA-C Concepts and Methods
- MODA: Models and Algorithms
- HPI-MODA-T Technologies and Tools
- MODA: Models and Algorithms
- HPI-MODA-S Specialization
Description
While commonly machine learning models make a single prediction, some predictive tasks require the assignment of more than one label for each instance. Furthermore, for some of these problems the labels have a hierarchy, i.e., they are structured in the shape of trees or directed acyclic graphs. For this specific class of problems, as far as we know, there is not a library with generic implementations of these algorithms available for users to quickly train and evaluate multi-label local hierarchical models on different application domains.
Therefore, within this seminar we aim to extend the local hierarchical classification library HiClass [1] to add support for multi-label classification, using real data for evaluation.
Learning objectives:
- You will improve your skills to develop efficient code;
- You will practice the test driven development technique;
- You will learn to plan and implement generic algorithms for hierarchical machine learning tasks;
- You will deepen teamwork skills;
- You will acquire the ability to contribute to open-source projects.
Requirements
You should have some previous experience programming in Python, familiarity with test driven development and other software engineering best practices as well as machine learning methods and libraries, in particular scikit-learn. Good knowledge of English is required to understand and discuss the assignments.
Literature
- Miranda, F. M., Köhnecke, N., and Renard, B. Y. (2021). HiClass: a Python library for local hierarchical classification compatible with scikit-learn. arXiv preprint arXiv:2112.06560.
- Silla, C.N. and Freitas, A.A., 2011. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1), pp.31-72.
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. and Vanderplas, J., 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, pp.2825-2830.
Learning
- Seminar for master students
- Language of instruction: English
- Maximum number of participants: 3
Meetings will happen in room K-1.03. Visual aids will be used when appropriate, in order to facilitate the discussion. Assignment details will be handed during lectures.
All interested participants must enroll by October 24 via email to fabio.malchermiranda(at)hpi.de or attend the first meeting.
It is possible to unenroll without consequences until 30.11.2022.
Examination
Grading is based on multiple factors concerning the project success, including:
- Written report (80% overall, structure 10%, code quality 40% and documentation 30%);
- A presentation due at the end of the semester (20%).
We will provide close support and regular feedback.
Dates
The first session will be on 24.10.2022, from 09:15 - 10:45. If necessary, the following sessions can be changed according to students' needs.
Zurück