PhD Defense: Analyzing and Improving Very Deep Neural Networks: from Optimization, Generalization to Compression

Please click on the link to connect you to the online PhD defense.

You should register via the link to get access to the online PhD defense.

Members of the defense committee:

Assistant Prof. Dr Tegawendé Bissyande, University of Luxembourg, Chairman
Prof. Dr Björn Ottersten, University of Luxembourg, Deputy Chairman
Assistant Prof. Dr Djamila Aouada, University of Luxembourg, Supervisor
Prof. Dr David Fofi, University of Burgundy, Le Creusot, France, Member
Assistant Prof. Dr Marwan Hassani, Eindhoven University of Technology, Eindhoven, The Netherlands, Member

Abstract:

Learning-based approaches have recently become popular for various computer vision tasks such as facial expression recognition, action recognition, banknote identification, image captioning, medical image segmentation, etc. The learning-based approach allows the constructed model to learn features, which result in high performance. Recently, the backbone of most learning-based approaches are deep neural networks (DNNs). Importantly, it is believed that increasing the depth of DNNs invariably leads to improved generalization performance. Thus, many state-of-the-art DNNs have over 30 layers of feature representations. In fact, it is not uncommon to find DNNs with over 100 layers in the literature. However, training very DNNs that have over 15 layers is not trivial. On one hand, such very DNNs generally suffer optimization problems. On the other hand, very DNNs are often overparameterized such that they overfit the training data, and hence incur generalization loss. Moreover, overparameterized DNNs are impractical for applications that require low latency, small Graphic Processing Unit (GPU) memory for operation and small memory for storage. Interestingly, skip connections of various forms have been shown to alleviate the difficulty of optimizing very DNNs.

In this thesis, we propose to improve the optimization and generalization of very DNNs with and without skip connections by reformulating their training schemes. Specifically, the different modifications proposed to allow the DNNs to achieve state-of-the-art results on several benchmarking datasets.

The second part of the thesis presents the theoretical analyses of DNNs without and with skip connections based on several concepts from linear algebra and random matrix theory. The theoretical results obtained provide new insights into why DNNs with skip connections are easy to optimize, and generalize better than DNNs without skip connections. Ultimately, the theoretical results are shown to agree with practical DNNs via extensive experiments.

The third part of the thesis addresses the problem of compressing large DNNs into smaller models. Following the identified drawbacks of the conventional group LASSO for compressing large DNNs, the debiased elastic group least absolute shrinkage and selection operator (DEGL) is employed. Furthermore, the layer-wise subspace learning (SL) of latent representations in large DNNs is proposed. The objective of SL is learning a compressed latent space for large DNNs. In addition, it is observed that SL improves the performance of LASSO, which is popularly known not to work well for compressing large DNNs. Extensive experiments are reported to validate the effectiveness of the different model compression approaches proposed in this thesis. Finally, the thesis addresses the problem of multi-modal learning, where data from different modalities are combined into useful representations for improved learning results. As applications, different interesting methods and applications for improving the performance of multi-modal learning are explored.

Partager ce contenu