Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI

Programming life with deep learning: design your own molecule (Wintersemester 2022/2023)

Dozent: Prof. Dr. Bernhard Renard (Data Analytics and Computational Statistics) , Dr. Jakub Maciej Bartoszewicz (Data Analytics and Computational Statistics) , Melania Maria Nowicka (Data Analytics and Computational Statistics)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 01.10.2022 - 31.10.2022
  • Prüfungszeitpunkt §9 (4) BAMA-O: 30.11.2022
  • Lehrform: Seminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 3

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
Data Engineering MA
Digital Health MA
  • SCAD: Scalable Computing and Algorithms for Digital Health
    • HPI-SCAD-C Concepts and Methods
  • SCAD: Scalable Computing and Algorithms for Digital Health
    • HPI-SCAD-T Technologies and Tools
  • SCAD: Scalable Computing and Algorithms for Digital Health
    • HPI-SCAD-S Specialization
  • APAD: Acquisition, Processing and Analysis of Health Data
    • HPI-APAD-C Concepts and Methods
  • APAD: Acquisition, Processing and Analysis of Health Data
    • HPI-APAD-T Technologies and Tools
  • APAD: Acquisition, Processing and Analysis of Health Data
    • HPI-APAD-S Specialization
Cybersecurity MA
Software Systems Engineering MA


Artificial neural networks are used more and more in bioinformatics and computational biology. They can learn complex patterns in biological molecules and predict various features such as function, activity, 3D structure (shape) and more. Recent breakthroughs enable AI-guided design of new molecules, from small chemical compounds to large proteins.

In this seminar, we will present the basic principles, challenges and methods of molecular design with deep neural networks, focusing on proteins - large molecules performing countless different functions in all living organisms. You will learn some necessary biological background, including different possible protein representations - from character strings to 3D coordinates. We will then discuss a selection of state-of-the-art machine learning methods for proteins, such as models generating novel protein sequences or computationally predicting their properties. This will include ready-to-use networks with simple user interfaces trained by other groups. With our support, you will prepare a mini-project, where you will apply your favorite methods to a target protein of your choice, either choosing one of example targets or coming up with your own application. You will implement all steps of your workflow as reusable code that can be run either locally or in a remote environment. Finally, you will visualize, interpret and present your results in a scientific presentation.

We will also talk about responsible research and the ethical aspects of your projects, as AI for synthetic biology and protein design in general. However, if you would like to learn more about this topic, we recommend a separate discussion seminar running this semester: “From fairness to cyberbiosecurity: accountability in machine learning for biology and medicine”

Learning objectives:

  • You will learn how to use free resources to generate new proteins
  • You will learn how to predict protein sequence, structure and function
  • You will learn how to engage with current scientific literature in the field and how to use their results in practice
  • You will learn how to combine different methods to design and conduct a scientific mini-project of your own
  • You will learn how to effectively share your results with others using interactive notebooks, reproducible code, and scientific talks


Biological background is not necessary to participate in the seminar, but you will need at least a basic understanding of deep learning, as well as Python coding skills. Experience with jupyter notebooks and deep learning frameworks such as TensorFlow or Pytorch will be helpful. The introduction of basic biological knowledge related to the seminar topic will be provided. Good English skills are required to participate.


  1. Dauparas et al., Robust deep learning–based protein sequence design using ProteinMPNN, Science, 2022, https://www.science.org/doi/10.1126/science.add2187
  2. Ferruz et al., ProtGPT2 is a deep unsupervised language model for protein design, Nat Comm, 2022, https://www.nature.com/articles/s41467-022-32007-7
  3. Nijkamp et al., ProGen2: Exploring the Boundaries of Protein Language Models, arXiv, 2022, https://arxiv.org/abs/2206.13517
  4. Sanderson et al., ProteInfer: deep networks for protein functional inference, bioRxiv, 2022, https://www.biorxiv.org/content/10.1101/2021.09.20.461077v2
  5. Bileschi et al., Using deep learning to annotate the protein universe, Nat Biotech, 2022, https://www.nature.com/articles/s41587-021-01179-w
  6. Mirdita et al., ColabFold: making protein folding accessible to all, Nat Methods, 2022, https://www.nature.com/articles/s41592-022-01488-1
  7. Wu et al., High-resolution de novo structure prediction from primary sequence, bioRxiv, 2022, https://www.biorxiv.org/content/10.1101/2022.07.21.500999v1
  8. Dallago et al., FLIP: Benchmark tasks in fitness landscape inference for proteins, NeurIPS 2021 Datasets and Benchmarks Track, 2021, https://openreview.net/forum?id=p2dMLEwL8tF

Lern- und Lehrformen

  • Seminar for master's students
  • Language of instruction: English
  • Maximum number of participants: 3

First two meetings will be held in the lecture format, giving you the opportunity to learn about the necessary background and providing an overview of a selection of available methods. You will then design your mini-project with our support, and present an introductory talk about the planned steps. You will be allowed to collaborate with other participants if desired, as long as individual contributions are clearly described. Following weekly meetings will serve mostly for interactive updates regarding the progress of the projects and regular support from us. At the end of the semester, you will present the final results in a talk, and hand in a written report including reproducible code documenting all steps of your workflow. 

The seminar will be conducted on-site, with a hybrid option whenever needed. Please register in the moodle of the course for further information.


The final grade consists of the following parts:

  • Oral presentation as introduction to the project (10%)
  • Oral presentation of the final results of your project (40%)
  • Final report and code (50%)


Kick-Off meeting will be on 26.10.2022, from 13:30-15:00. First presentations are planned for 30.11.2022 (opt-out by 23.11.2022).