Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

Applications of Deep Learning to Visual Content Processing and Analysis

dc.contributor.advisorChen, Jun
dc.contributor.authorLiu, Xiaohong
dc.contributor.departmentElectrical and Computer Engineeringen_US
dc.date.accessioned2021-08-26T15:49:25Z
dc.date.available2021-08-26T15:49:25Z
dc.date.issued2021
dc.description.abstractThe advancement of computer architecture and chip design has set the stage for the deep learning revolution by supplying enormous computational power. In general, deep learning is built upon neural networks that can be regarded as a universal approximator of any mathematical function. In contrast to model-based machine learning where the representative features are designed by human engineers, deep learning enables the automatic discovery of desirable feature representations based on a data-driven manner. In this thesis, the applications of deep learning to visual content processing and analysis are discussed. For visual content processing, two novel approaches, named LCVSR and RawVSR, are proposed to address the common issues in the filed of Video Super-Resolution (VSR). In LCVSR, a new mechanism based on local dynamic filters via Locally Connected (LC) layers is proposed to implicitly estimate and compensate motions. It avoids the errors caused by the inaccurate explicit estimation of flow maps. Moreover, a global refinement network is proposed to exploit non-local correlations and enhance the spatial consistency of super-resolved frames. In RawVSR, the superiority of camera raw data (where the primitive radiance information is recorded) is harnessed to benefit the reconstruction of High-Resolution (HR) frames. The developed network is in line with the real imaging pipeline, where the super-resolution process serves as a pre-processing unit of ISP. Moreover, a Successive Deep Inference (SDI) module is designed in accordance with the architectural principle suggested by a canonical decomposition result for Hidden Markov Model (HMM) inference, and a reconstruction module is built with elaborately designed Attention based Residual Dense Blocks (ARDBs). For visual content analysis, a new approach, named PSCC-Net, is proposed to detect and localize image manipulations. It consists of two paths: a top-down path that extracts the local and global features from an input image, and a bottom-up path that first distinguishes manipulated images from pristine ones via a detection head, and then localizes forged regions via a progressive mechanism, where manipulation masks are estimated from small scales to large ones, each serving as a prior of the next-scale estimation. Moreover, a Spatio-Channel Correlation Module (SCCM) is proposed to capture both spatial and channel-wise correlations among extracted features, enabling the network to cope with a wide range of manipulation attacks. Extensive experiments validate that the proposed methods in this thesis have achieved the SOTA results and partially addressed the existing issues in previous works.en_US
dc.description.degreeDoctor of Philosophy (PhD)en_US
dc.description.degreetypeDissertationen_US
dc.identifier.urihttp://hdl.handle.net/11375/26815
dc.language.isoen_USen_US
dc.subjectVideo Super-Resolutionen_US
dc.subjectImage Manipulation Detection and Localizationen_US
dc.titleApplications of Deep Learning to Visual Content Processing and Analysisen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Liu_Xiaohong_202107_PhD.pdf
Size:
11.34 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: