Event

Causal Analysis of Biomedical Data – Prof. Gianluca Bontempi

This is a hybrid event. Join in person at the LCSB, or tune in remotely via Webex.

From feature selection to causal inference in large dimensional settings

“We are drowning in data and starving for knowledge” is an adage of data scientists that nowadays should be rephrased into ”we are drowning in associations and starving for causality”. The democratisation of machine learning software and big data platforms increases the risk of ascribing causal meaning to simple and sometimes brittle associations. This risk is particularly evident in settings (like bioinformatics, social sciences, and economics) characterised by high dimensions, multivariate interactions, and dynamic behaviour where direct manipulation is unethical and impractical. The conventional ways to recover a causal structure from observational data are score-based and constraint-based algorithms. Their limitations, mainly in high dimension, opened the way to alternative learning algorithms which pose the problem of causal inference as the classification of probability distributions. The rationale of those algorithms is that the existence of a causal relationship induces a constraint on the observational multivariate distribution. In other words, causality leaves footprints in the data distribution that can hopefully be used to reduce the uncertainty about the causal structure. The first part of the presentation will introduce some basics of causal inference and will present a causal extension of a feature selection method commonly used in bioinformatics. The second part of the talk will focus on the D2C approach, which featurizes observed data using information theory asymmetric measures to extract meaningful hints about the causal structure. The D2C algorithm performs three steps to predict the existence of a directed causal link between two variables in a multivariate setting: (i) it estimates the Markov Blankets of the two variables of interest and ranks its components in terms of their causal nature, (ii) it computes several asymmetric descriptors and (iii) it learns a classifier (e.g. a Random Forest) returning the probability of a causal link given the value of the descriptor.More information coming soon

About the speaker

Gianluca Bontempi is Full Professor at the Université Libre de Bruxelles (ULB), Belgium, founder and co-head of the ULB Machine Learning Group. He has been Director of (IB)2, the ULB/VUB Interuniversity Institute of Bioinformatics in Brussels in 2013-17. His main research interests are big data mining, machine learning, bioinformatics, causal inference, predictive modeling and their application to complex tasks in engineering (time series forecasting, fraud detection) and life science (network inference, gene signature extraction).  He is the running president of the TRAIL Belgian (French-speaking) initiative,  IEEE Senior Member and  associate editor of the International Journal of Forecasting.

Profile picture of Prof. Gianluca Bontempi

The Causal Analysis of Biomedical Data Lecture Series is supported by the Luxembourg National Research Fund (FNR) RESCOM Program.

FNR logo