PhD Defence: Enhancing Deep Learning Performance with Second-order Methods, Random Embeddings, and Relation Extraction

You are cordially invited to attend the PhD Defence of Alessandro TEMPERONI on Monday, the 3rd of July at 10.30 am in Room 1.040, 1st floor (MNO) – Belval Campus. You can join the defence also online via Webex.

Members of the defence committee:

Chair: Prof. Dr. Pascal BOUVRY, University of Luxembourg
Vice-chair: Prof. Dr. Ulrich SORGER, University of Luxembourg
Member (Supervisor): Prof. Dr. Martin THEOBALD, University of Luxembourg
Member: Prof. Dr. Paolo MERIALDO, Università Roma Tre
Member: Prof. Dr. Gerhard WEIKUM, Max Planck Institute for Informatics

Abstract:

Since the inception of civilization, the aspiration to create machines capable of thinking has persisted. Over the centuries, this dream has gradually come to fruition, with Artificial Intelligence now emerging as a field with numerous applications and promising research avenues. As a subfield, Deep Learning (DL) is dedicated to developing algorithms that can discern patterns in data, empowering machines to make predictions, draw conclusions, and carry out intricate tasks. This thesis delves into the enhancement of DL performance from various perspectives and vantage points, revealing the complexity of this field and the myriad ways it can be approached, while also highlighting the challenges of navigating through different levels of abstraction and maintaining the focus on the problem at hand.

In our exploration, we discuss second-order methods, random embeddings, and relation extraction. We address the initialisation and optimisation of neural networks (NNs) by introducing a new approximated chain rule, which aims to enable rapid and systematic training of NNs. Given that NNs are use-case sensitive, researchers and practitioners must undergo a series of laborious steps before deploying and ultimately releasing a functional model. Although the challenges of training and optimizing NNs are well understood, no single solution exists, and most contemporary approaches rely on simple and empirical heuristics. In part 2, our approximated chain rule for Hessian backpropagation transcends empirical first-order methods and lays a theoretical foundation for optimizing and training NNs. We systematically evaluate our approach through experiments, showcasing the superior efficiency of second-order methods across multiple datasets. In part 3, we shift our focus to analyzing the performance of random embeddings as a crucial tool for dimensionality reduction, with these embeddings playing a significant role in both Machine Learning (ML) and DL algorithms. Our research demonstrates improved bounds for sparse random embeddings compared to previous state-of-the-art techniques, with our bounds exhibiting considerable improvements across a range of real-world datasets. Our analysis strives to bridge the gap between theory and practice, providing robust and provable guarantees for sparse random embeddings while extending to Rademacher random embeddings and offering non-oblivious insights into input data. Lastly, in part 4, we delve into information extraction, specifically targeting relation extraction. By examining the most advanced techniques and tools for extracting and analysing textual data, we demonstrate their applicability in real-world scenarios. Our approach combines distant supervision, few-shot learning, OpenIE, and various language models to enhance the task of relation extraction, showcasing the capabilities of these methods through a simple and efficient approach to extracting relational labels from text. The diverse approaches and strategies discussed in this thesis collectively succeed in augmenting deep learning performance across various scenarios and applications.

Partager ce contenu