Symbolic inference in materials science: Towards human-friendly artificial intelligence
The number of possible materials is practically infinite, while only few hundred thousands of (inorganic) materials are known to exist and for few of them even basic properties are systematically known. In order to speed up the identification and design of new and novel optimal materials for a desired property or process, strategies for quick and well-guided exploration of the materials space are highly needed. A desirable strategy would be to start from a large body of experimental and/or computational data, and by means of artificial-intelligence (AI) methods, to identify yet unseen patterns or structures in the data, and consequentially predictive (data-driven) models. This leads to the identification of maps (or charts) of materials where different regions correspond to materials with different properties. The main challenge on building such maps is to find the appropriate descriptive parameters (called descriptors) that define these regions of interest. Here, I present novel artificial-intelligence methods, in the form of both predictive and descriptive symbolic inference, for the identification of descriptors and materials maps, tailored to work (also) with “small-data”. We demonstrate the approaches by means of two case studies: a) the selective oxidation of propane, starting from an experimental dataset of few vanadium-based catalysts. These materials were carefully synthesized, fully characterized, and tested according to standardized protocols. b) CO2 conversion on oxide materials, starting from a dataset of ab initio pristine surfaces and CO2 adsorbed on different sites of the oxide surfaces. I focus on the (verified) predictive power of the learned maps, which goes beyond the mere interpolation of more “traditional” AI approaches, and analyze current and future challenges.
Bio: Dr. Luca M. Ghiringhelli leads the group “Big-Data analytics for Materials Science” in the Novel Materials Discovery (NOMAD) Laboratory at the Fritz Haber Institute and the Humboldt University in Berlin. Formerly, he has led the group “Ab initio statistical mechanics of cluster catalysis and corrosion” in the theory department at the FHI. His background is in computational statistical mechanics and electronic structure methods, applied to the evaluation of thermodynamic and kinetic properties of bulk materials, surfaces, and nano-clusters. Within the NOMAD Lab, he leads the development and application of methods based on compressed sensing, symbolic regression, subgroup discovery, and deep learning to the modelling of big (and not so big) data in materials science. His focus is on methods that yield interpretable models and can cope with “small data” for training. He also co-led the development of the hierarchical and extensible metadata infrastructure for the NOMAD Lab. Since January 2018, he is co-leading the Psi-k working group on “High-throughput screening and data analytics”.