Event

Artificial Intelligence for Bioscientific Research – Dr Jannis Born

This is an online event. Tune in via Webex here.

Language Models for Molecular Discovery

The discovery of new molecules and materials with desired properties is essential to addressing global challenges in drug discovery, energy storage, and sustainability. Traditional molecular design approaches are often time-consuming and limited by human expertise and intuition. Recent advances in natural language processing, particularly transformer-based architectures, have given rise to “scientific language models” that can operate on molecular representations, proteins, and chemical reactions.

In my presentation, I will explore how language models are improving molecular discovery by treating molecules as sequences that can be processed similarly to natural language. I will discuss recent breakthroughs in using these models for de novo drug design, molecular property prediction, and reaction chemistry. The talk will cover both generative approaches for hypothesis generation and predictive models that guide the discovery process, demonstrating how these methods can significantly accelerate the identification of promising molecular candidates.

I will also present the Generative Toolkit for Scientific Discovery (GT4SD), an open-source platform that democratizes access to state-of-the-art generative models and enables researchers to integrate these powerful tools into their discovery workflows with just a few lines of code.

About the speaker

Jannis Born is a Research Scientist at IBM Research in Zurich, Switzerland, specializing in AI for Science, Language Models, and Quantum Machine Learning. He obtained his PhD from ETH Zurich in 2022 for his pioneering work on language models for molecular design, conducted in collaboration with IBM Research. His research focuses on developing transformer-based architectures and generative models that can accelerate scientific discovery across chemistry, materials science, and drug discovery.

He is a key contributor to the Generative Toolkit for Scientific Discovery (GT4SD), an extensive open-source library that enables scientists to train and deploy state-of-the-art generative models for hypothesis generation in scientific research. His work has been instrumental in demonstrating how language models can contribute to accelerating the molecule discovery cycle, with promising applications in early-stage drug discovery, property prediction, and reaction chemistry. Before his PhD, he completed an M.Sc. in Neural Systems & Computation at ETH/UZH Zurich (with distinction) and a B.Sc. in Cognitive Science.

The Causal Analysis of Biomedical Data Lecture Series is supported by the Luxembourg National Research Fund (FNR) RESCOM Program.

FNR logo