An international team of researchers from the University of Luxembourg, Technische Universität Berlin (TU Berlin), the Berlin Institute for the Foundations of Learning and Data (BIFOLD), and Google DeepMind has developed a new machine learning model capable of simulating a wide variety of molecular systems – for example, large and complex biological molecules – with quantum-mechanical accuracy.
The new method, called SO3LR, combines the latest developments in neural network design with physical laws and was trained on a specially curated dataset of four million different molecular structures. This enables the model to be applied not only to large biomolecules like proteins, sugars, or cell membranes, but also to a broad spectrum of other molecules without the need for retraining. This universal applicability of SO3LR paves the way for accelerated drug discovery and a deeper understanding of molecular biology.
An interdisciplinary and international endeavour, the project was conceived by Uni.lu doctoral candidate Adil Kabylda and his PhD supervisor Prof. Alexandre Tkatchenko. As project lead, Kabylda developed and trained the model, then designed, performed, and analysed the simulations. This work, supported by an FNR AFR Individual PhD Fellowship, constitutes the final chapter of his PhD thesis, which is dedicated to atomistic-level (bio)molecular modelling.
Findings are now published in the prestigious Journal of the American Chemical Society (JACS):
Molecular dynamics (MD) simulations enable us to understand and predict the behavior of molecules. They allow for the description of molecular interactions over time and provide insights into their structure, dynamics, and functioning. The exact simulation of the interaction of large biomolecules could, for example, enable the development of new drugs without the need to first conduct time-consuming, material-intensive, and costly experiments.
For decades, scientists have been facing a fundamental trade-off: Methods were either fast but only approximate and not transferable between different molecules, or highly accurate but computationally extraordinarily expensive. This trade-off restricted the scope of accurate simulations to small systems with a few hundred atoms. Large and complex biomolecules – e.g. proteins or sugars – can contain tens of thousands of atoms, limiting our ability to accurately model and understand fundamental dynamic processes like protein folding or cell assembly.
Scaling AI-based approach to large biomedical systems
In recent years, AI-based models have started to bridge this gap between approximate (classical) methods and highly accurate (quantum mechanical) methods. Despite great advances in the field, a persistent challenge has been the scaling of AI-based approaches to large biomolecular systems of realistic size. Simply put, the atoms in a molecule not only interact with atoms that are nearby but also with atoms far away. The larger the molecule, the more important are the long-range effects. t is the lack of accurate treatment of quantum effects at long distances between atoms is what hindered this adaptation for large and complex biomolecules.
A hybrid approach to overcome a heap of challenges
The scientists designed the new SO3LR model using a hybrid approach. It divides the complex task of calculating the quantum mechanical interactions between the atoms into two complementary components: A fast and highly accurate machine learning model, which learns the complex, quantum many-body interactions that occur at short and medium distances is combined with universal, physically-grounded equations, accurately describing the interactions between the atoms at long distances.
‟ Reliable simulations at the biomolecular scale hinge on long-range effects, so SO3LR encodes them by design.”

Doctoral researcher
“This allows our model to focus its powerful learning capacity on capturing the complex quantum effects that traditional models are missing to date,” adds Thorben Frank, postdoctoral researcher at TU Berlin and BIFOLD Institute.
The second challenge which needed to be solved was the universal applicability of a single model to many different molecules. Therefore, the team created an extensive and diverse dataset of over 4 million carefully curated molecular structures, which has been a key factor for “teaching” SO3LR how to accurately describe the vast diversity of molecules that exist in nature, achieving a level of transferability beyond that of former methods.
To demonstrate the capabilities of SO3LR, the research team performed a series of challenging simulations for all four major types of biomolecules that can be found in nature. For example, they performed simulations of large biomolecular systems in an explicit water environment, including the crambin protein and a complex glycoprotein. They further performed simulations for a lipid POPC bilayer, which serves as a model system for human cell membranes. “The crucial breakthrough with SO3LR lies in its universality. Instead of having to go through the lengthy and complex process of data generation and subsequent model training for every new molecule, we provide a single, ready-to-use foundation model. This saves researchers the time and compute-intensive preparation steps and allows them to directly test hypotheses with quantum-mechanical accuracy,” explains Prof. Klaus-Robert Müller, Co-Director of BIFOLD.
‟ By combining machine learning with physical principles, we are opening the door to modelling realistic biological processes with quantum accuracy, which has profound implications for understanding health and disease and designing the next generation of drugs.”
Full professor in Theoretical Condensed Matter Physics