The Doctoral School in Sciences and Engineering is happy to invite you to Evelyn Yinghua LI’s defence entitled
Test Input Prioritization for Deep Neural Networks
Supervisor: Assoc. Prof. Tegawendé BISSYANDE
The rapid adoption of deep neural networks (DNNs) has revolutionized machine learning in several domains. As a result, thorough evaluation and validation of DNNs are crucial for ensuring their effectiveness. Testing DNNs, however, is challenging due to three key issues: 1) manual labeling is the mainstream; 2) test sets can be large scale; and 3) domain-specific knowledge can be required for labeling. To reduce the labeling costs, one promising approach is test prioritization, which focuses on identifying and prioritizing potentially misclassified test inputs. Early identification of such challenging inputs can accelerate the DNN debugging process and improve the efficiency of DNN testing. While existing test prioritization approaches for DNNs have proven effective in some cases, they show limitations when applied to more specialized scenarios.
In this dissertation, we focus on four special scenarios, namely video classification, Graph Neural Networks (GNN) classification, compressed DNN classification and 3D shape classification. We proposed new test prioritization methods tailored specifically to these scenarios and conducted empirical studies to demonstrate their effectiveness. Below, we present the core contributions.
1. Test prioritization for videos. To solve the labeling-cost problem specifically in the context of video test inputs, we proposed a novel test prioritization approach called VRank. The fundamental concept underlying VRank is that test inputs situated closer to the decision boundary of the model are at a higher risk of being predicted incorrectly. To capture the spatial relationship between a video test and the decision boundary, we designed a series of feature generation strategies tailored to video-type tests. Based on these strategies, VRank generated features for each test in the test set to perform test prioritization.
2. Test prioritization for GNNs. To relieve the labeling-cost problem and improve the efficiency of GNN testing, we propose a GNN-oriented test prioritization approach, NodeRank. NodeRank leverages the concepts of mutation testing to perform test prioritization, operating on the core premise that if a test input (node) can kill many mutated models and produce different prediction results with many mutated inputs, this input is considered more likely to be misclassified by the GNN model and should be prioritized higher.
3. Test prioritization for compressed DNNs. To address the challenge of labeling-cost reduction in testing compressed DNN models, we proposed PriCod, which can identify and prioritize potentially misclassified tests. PriCod leverages the behavior disparities caused by model compression, along with the embeddings of test inputs, to effectively prioritize potentially misclassified tests.
4. Test prioritization for 3D point clouds. To address the issue of high labeling costs for 3D point cloud data, we propose a novel test prioritization approach, PCPrior. PCPrior relies on the premise that test inputs closer to the decision boundary of the model are more likely to be predicted incorrectly. To this end, we designed a set of feature generation strategies tailored to 3D point clouds and utilized the generated features for test prioritization.
In summary, this dissertation proposes four new test prioritization methods tailored to four specialized DNN scenarios and demonstrates their effectiveness against the compared methods.