Event

Doctoral Defence: Braulio BLANCO LAMBRUSCHINI

The Doctoral School in Science and Engineering is happy to invite you to Braulio BLANCO LAMBRUSCHINI’s defence entitled

Leveraging Textual Data for Multidimensional Company Risk Prediction

Supervisor: Dr. Mats Håkan BRORSSONE

Effective risk analysis and prediction are crucial for societal development, as they enhance decision-making processes related to the allocation of financial resources. Most current risk prediction methods rely on financial statements, ratios, other numerical data, and short sequences of text. While financial data has proven to be an essential risk predictor, it offers a limited perspective on a company’s overall risk level. Many factors within a business can contribute to risk. This work proposes a multidimensional risk model that incorporates diverse information from reliable sources. Using this collected and cleaned data, additional information is extracted, new insights are generated, and networks of people, companies, and geographic locations are analyzed to predict the potential risk of an SME.

The current work is structured following a process workflow. Information is extracted using data provided by the partner company, with textual content sourced from various official company documents through OCR and PDF reading tools. Relationships between companies, audit firms, auditors, and notaries are established using information extracted from textual sources and additional data sources. Text-based risk prediction, geolocalization, and auto-clustering of companies (based on the format and writing style of the authors) also contribute additional risk insights. To integrate entities from different datasets, a character-embedding similarity search is proposed. All this information is consolidated into a graph network, where company relationships are analyzed to identify various risk factors and complement the text-based risk assessment. To facilitate more effective data exploration, an initial user interface has been proposed.

In conclusion, this thesis successfully developed a data pipeline to process information from Luxembourgish SMEs, leveraging publicly available data. The pipeline was enriched using advanced Machine Learning and Deep Learning techniques to assess company risk across multiple dimensions. This approach provides decision-makers with deeper insights, enabling more informed and strategic decisions. The findings suggest that these models can be adapted for use in other countries or scaled to analyze larger enterprises. Additionally, the analysis could be further enhanced by integrating supplementary data sources, such as social networks, and employing more sophisticated methods, such as Graph Neural Networks, for data integration.