Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI
Login
 

Programming life with deep learning II: protein language models (Sommersemester 2023)

Lecturer: Prof. Dr. Bernhard Renard (Data Analytics and Computational Statistics) , Dr. Jakub Maciej Bartoszewicz (Data Analytics and Computational Statistics) , Melania Maria Nowicka (Data Analytics and Computational Statistics)

General Information

  • Weekly Hours: 4
  • Credits: 6
  • Graded: yes
  • Enrolment Deadline: 01.04.2023 - 07.05.2023
  • Teaching Form: Seminar
  • Enrolment Type: Compulsory Elective Module
  • Course Language: English
  • Maximum number of participants: 4

Programs, Module Groups & Modules

IT-Systems Engineering MA
Data Engineering MA
  • DANA: Data Analytics
    • HPI-DANA-K Konzepte und Methoden
  • DANA: Data Analytics
    • HPI-DANA-T Techniken und Werkzeuge
  • DANA: Data Analytics
    • HPI-DANA-S Spezialisierung
  • CODS: Complex Data Systems
    • HPI-CODS-K Konzepte und Methoden
  • CODS: Complex Data Systems
    • HPI-CODS-T Techniken und Werkzeuge
  • CODS: Complex Data Systems
    • HPI-CODS-S Spezialisierung

Description

Artificial neural networks are used more and more in bioinformatics and computational biology. They can learn complex patterns in biological molecules and predict various features such as function, activity, 3D structure (shape) and more. Recent breakthroughs enable AI-guided design of new molecules, from small chemical compounds to large proteins.

In this seminar, we will delve deeper into applications of protein language models in the protein design context. You will have an opportunity to build on your own previous experiences and ideas. You will design and complete a mini-project using a protein language model of your choice, exploring protein sequence generation, property/function prediction, or both. We will also individually discuss the responsible research and the ethical aspects of your work, both during and after completion of the projects.

Learning objectives:

  • You will explore the limits of using deep learning to predict properties of novel proteins 
  • You will learn how to leverage protein language models to generate new sequences with properties of interest
  • You will extend your skills in engaging with current scientific literature in the field and how to use their results in practice
  • You will learn how to efficiently combine different methods to design and conduct a scientific mini-project of your own
  • You will learn how to clearly share your results with others using code, written word, and scientific talks

Requirements

Previous experience with protein language models for protein design and/or property prediction is necessary; successful completion of the projects during the seminar Programming life with deep learning: design your own molecule (https://hpi.de/studium/im-studium/lehrveranstaltungen/it-systems-engineering-ma/lehrveranstaltung/wise-22-23-3637-programming-life-with-deep-learning-design-your-own-molecule.html) is therefore a requirement for this course. Good English skills are also required to participate.

Literature

Hesslow et al., RITA: a Study on Scaling Up Generative Protein Sequence Models, arXiv, 2022, https://arxiv.org/abs/2205.05789

 

Ferruz et al., ProtGPT2 is a deep unsupervised language model for protein design, Nat Comm, 2022, https://www.nature.com/articles/s41467-022-32007-7

 

Dallago et al., FLIP: Benchmark tasks in fitness landscape inference for proteins, NeurIPS 2021 Datasets and Benchmarks Track, 2021, https://openreview.net/forum?id=p2dMLEwL8tF

 

Lin et al., Evolutionary-scale prediction of atomic-level protein structure with a language model, Science, 2023, https://www.science.org/doi/10.1126/science.ade2574

 

Nijkamp et al., ProGen2: Exploring the Boundaries of Protein Language Models, arXiv, 2022, https://arxiv.org/abs/2206.13517

Learning

  • Seminar for master's students
  • Language of instruction: English
  • Maximum number of participants: 5

You will collaborate with your team members to design your mini-project with our support, and present an introductory talk about the planned steps. Regular meetings will serve mostly for interactive updates regarding the progress of the projects and additional support from us. At the end of the seminar, you will present the final results in a talk, and hand in a written report including reproducible code documenting all steps of your workflow. 

The seminar will be conducted on-site, with a hybrid/remote option whenever needed. To register, please contact us via e-mail at: jakub.bartoszewicz(at)hpi.de and melania.nowicka(at)hpi.de

Topics will be presented in the first meeting and preferences need to be declared until the end of the first week of the semester (via email) and will be assigned in the second meeting of the class. The last time point to de-register from the course is the end of the third week (May 5, 2023).

Examination

The final grade consists of the following parts:

  • Oral presentation of the final results of your project (25%)
  • Final report (including research methodology and code, 75%)

Dates

XXXXXXXXX

Zurück