Machine Learning Force Fields: Towards Modelling Flexible Molecules
The aim of this work is to investigate the capability of the existing machine learning force fields (MLFFs) to provide simultaneously accurate and efficient models offering unprecedented insights into the (thermo)dynamics of realistic molecular systems. Using the examples of molecular interactions that are pervasive in (bio)chemical systems, I show a counterintuitive effect of strengthening of such interactions, as well as an unexpected prevalence of quantum nuclear fluctuations over thermal contributions at room temperature. I reveal that, when dealing with complex potential-energy surfaces (PESs), the predictions of state-of-the-art ML models (BPNN, SchNet, GAP, and sGDML) greatly depend on the descriptor used, and on the region of the PES being sampled. Given the varying performance of MLFFs, I present a descriptor optimization scheme improving simultaneously the accuracy and efficiency of ML models. My results show that the commonly employed strategies followed to construct both local and global descriptors need to be improved because the optimal descriptors are a non-trivial combination of local and global features. Therefore, the work presented in this thesis highlights the potential of MLFFs to provide insights into chemical systems while, at the same time, discloses the current limitations preventing the construction of accurate MLFFs for more realistic systems. Also, I propose the optimization of the description of interactions within an ML model as a valuable step towards obtaining efficient and accurate MLFFs of large and flexible molecules.