Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI

Programming life with deep learning II: protein language models (Sommersemester 2023)

Dozent: Prof. Dr. Bernhard Renard (Data Analytics and Computational Statistics) , Dr. Jakub Maciej Bartoszewicz (Data Analytics and Computational Statistics) , Melania Maria Nowicka (Data Analytics and Computational Statistics)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 01.04.2023 - 07.05.2023
  • Lehrform: Seminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 4

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-K Konzepte und Methoden
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-T Techniken und Werkzeuge
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-S Spezialisierung
  • SAMT: Software Architecture & Modeling Technology
    • HPI-SAMT-K Konzepte und Methoden
  • SAMT: Software Architecture & Modeling Technology
    • HPI-SAMT-T Techniken und Werkzeuge
  • SAMT: Software Architecture & Modeling Technology
    • HPI-SAMT-S Spezialisierung
Data Engineering MA


Artificial neural networks are used more and more in bioinformatics and computational biology. They can learn complex patterns in biological molecules and predict various features such as function, activity, 3D structure (shape) and more. Recent breakthroughs enable AI-guided design of new molecules, from small chemical compounds to large proteins.

In this seminar, we will delve deeper into applications of protein language models in the protein design context. You will have an opportunity to build on your own previous experiences and ideas. You will design and complete a mini-project using a protein language model of your choice, exploring protein sequence generation, property/function prediction, or both. We will also individually discuss the responsible research and the ethical aspects of your work, both during and after completion of the projects.

Learning objectives:

  • You will explore the limits of using deep learning to predict properties of novel proteins 
  • You will learn how to leverage protein language models to generate new sequences with properties of interest
  • You will extend your skills in engaging with current scientific literature in the field and how to use their results in practice
  • You will learn how to efficiently combine different methods to design and conduct a scientific mini-project of your own
  • You will learn how to clearly share your results with others using code, written word, and scientific talks


Previous experience with protein language models for protein design and/or property prediction is necessary; successful completion of the projects during the seminar Programming life with deep learning: design your own molecule (https://hpi.de/studium/im-studium/lehrveranstaltungen/it-systems-engineering-ma/lehrveranstaltung/wise-22-23-3637-programming-life-with-deep-learning-design-your-own-molecule.html) is therefore a requirement for this course. Good English skills are also required to participate.


Hesslow et al., RITA: a Study on Scaling Up Generative Protein Sequence Models, arXiv, 2022, https://arxiv.org/abs/2205.05789


Ferruz et al., ProtGPT2 is a deep unsupervised language model for protein design, Nat Comm, 2022, https://www.nature.com/articles/s41467-022-32007-7


Dallago et al., FLIP: Benchmark tasks in fitness landscape inference for proteins, NeurIPS 2021 Datasets and Benchmarks Track, 2021, https://openreview.net/forum?id=p2dMLEwL8tF


Lin et al., Evolutionary-scale prediction of atomic-level protein structure with a language model, Science, 2023, https://www.science.org/doi/10.1126/science.ade2574


Nijkamp et al., ProGen2: Exploring the Boundaries of Protein Language Models, arXiv, 2022, https://arxiv.org/abs/2206.13517

Lern- und Lehrformen

  • Seminar for master's students
  • Language of instruction: English
  • Maximum number of participants: 5

You will collaborate with your team members to design your mini-project with our support, and present an introductory talk about the planned steps. Regular meetings will serve mostly for interactive updates regarding the progress of the projects and additional support from us. At the end of the seminar, you will present the final results in a talk, and hand in a written report including reproducible code documenting all steps of your workflow. 

The seminar will be conducted on-site, with a hybrid/remote option whenever needed. To register, please contact us via e-mail at: jakub.bartoszewicz(at)hpi.de and melania.nowicka(at)hpi.de

Topics will be presented in the first meeting and preferences need to be declared until the end of the first week of the semester (via email) and will be assigned in the second meeting of the class. The last time point to de-register from the course is the end of the third week (May 5, 2023).


The final grade consists of the following parts:

  • Oral presentation of the final results of your project (25%)
  • Final report (including research methodology and code, 75%)