Analysis and Applications of Deep Learning Features on Visual Tasks

Shi, Kangdi

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/28152

Title:	Analysis and Applications of Deep Learning Features on Visual Tasks
Authors:	Shi, Kangdi
Advisor:	Chen, Jun
Department:	Electrical and Computer Engineering
Keywords:	Deep Learning;Image Inpainting;Image Retrieval;Computer Vision
Publication Date:	2022
Abstract:	Benefiting from hardware development, deep learning (DL) has become a popular research area in recent decades. Convolutional neural network (CNN) is a critical deep learning tool that has been utilized in many computer vision problems. Moreover, the data-driven approach has unleashed CNN's potential in acquiring impressive learning ability with minimum human supervision. Therefore, many computer vision problems are brought into the spotlight again. In this thesis, we investigate the application of deep-learning-based methods, particularly the role of deep learning features, in two representative visual tasks: image retrieval and image inpainting. Image retrieval aims to find in a dataset images similar to a query image. In the proposed image retrieval method, we use canonical correlation analysis to explore the relationship between matching and non-matching features from pre-trained CNN, and generate compact transformed features. The level of similarity between two images is determined by a hypothesis test regarding the joint distribution of transformed image feature pairs. The proposed approach is benchmarked against three popular statistical analysis methods, Linear Discriminant Analysis (LDA), Principal Component Analysis with whitening (PCAw), and Supervised Principal Component Analysis (SPCA). Our approach is shown to achieve competitive retrieval performances on Oxford5k, Paris6k, rOxford, and rParis datasets. Moreover, an image inpainting framework is proposed to reconstruct the corrupted region in an image progressively. Specifically, we design a feature extraction network inspired by Gaussian and Laplacian pyramid, which is usually used to decompose the image into different frequency components. Furthermore, we use a two-branch iterative inpainting network to progressively recover the corrupted region on high and low-frequency features respectively and fuse both high and low-frequency features from each iteration. Moreover, an enhancement model is introduced to employ neighbouring iterations' features to further improve intermediate iterations' features. The proposed network is evaluated on popular image inpainting datasets such as Paris Streetview, Celeba, and Place2. Extensive experiments prove the validity of the proposed method in this thesis, and demonstrate the competitive performance against the state-of-the-art.
URI:	http://hdl.handle.net/11375/28152
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
Shi_Kangdi_finalsubmission202212_Ph.D.pdf Open Access		30.61 MB	Adobe PDF	View/Open

Show full item record