Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/28917
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Shirani, Shahram | - |
dc.contributor.author | Jenab, Maryam | - |
dc.date.accessioned | 2023-09-21T14:52:44Z | - |
dc.date.available | 2023-09-21T14:52:44Z | - |
dc.date.issued | 2023 | - |
dc.identifier.uri | http://hdl.handle.net/11375/28917 | - |
dc.description.abstract | The prediction of perceived quality of images by the human visual system (HVS) has gained considerable interest. The HVS serves as the ultimate destination for most videos. However, prior to reaching its destination, an image undergoes degradation through compression and transmission. Given the bitrate limitations associated with video transmission and storage, video compression plays a crucial role in image communication. In this thesis, we present a novel convolution neural network (CNN)-based method for predicting the distortion of encoded video without performing compression. We have employed strategies to overcome the limitation of the dataset size. First, instead of employing scored samples based on mean opinion scores (MOS), we utilize a closely related index, Video Multimethod Assessment Fusion (VMAF), which aligns with HVS perceived quality scores and is easier to generate compared to MOS. Second, we train our CNN at patch (square area in a frame) level to increase the number of training samples and enhance the prediction accuracy. The patch-level quality predictor comprises a deep neural network (DNN) network consisting of a series of CNN and pooling layers, followed by a regressor with fully connected layers. This CNN network accepts patches of uncompressed video frames and patches from motion estimation (ME) maps as input and generates quality scores (VMAF) for the patches. We have introduced and compared three patch-wise to frame-wise transformations for frame-level quality prediction. We proposed a method for predicting perceived quality, both for intra-frames and inter-frames. The results demonstrate the excellent performance of our innovative frame-level compression quality prediction method. Rate control (RC) plays a crucial role in compression algorithms by minimizing distortion while adhering to bit rate constraints. RC operates by bit allocation at two levels: block or frame. Despite advancements in compression algorithms, conventional RC models still struggle with assigning bit rates to frames with fast motions and scene changes. To address this issue, we propose a novel CNN-based method for bit rate prediction, which overcomes the limitations of traditional RC models. Our approach consists of two phases: patch-level and frame-level bit rate prediction. The proposed CNN network includes CNN layers for extracting spatial and temporal features from video frames, and pooling layers to prevent overfitting. These CNN and pooling layers are followed by a regressor that predicts the bit rate of patches based on their extracted features. We utilize the trained patch-wise CNN bit rate predictor for frame-level bit rate prediction by feeding the extracted features of patches into the regressor. For intra-frame bit rate prediction, we employ frame patches to extract spatial features. For inter-frame bit rate prediction, in addition to spatial features, we incorporate motion estimation (ME) maps and extract temporal patch features. Notably, our proposed method is the first end-to-end CNN-based RC method that operates without relying on hand-crafted features. By not considering the previous frames' encoding information, such as their bit rates, our approach successfully predicts the bit rate even during scene changes. Previous research has demonstrated that reducing spatial and temporal redundancy before encoding can improve compression performance. Essentially, if a video frame is downscaled before encoding and upscaled after decoding, the resulting frame exhibits higher quality compared to a conventionally encoded frame at the same bit rate. In a temporally adaptive encoder, the video's frame rate is down-converted before encoding and up-converted after decoding. However, the impact of redundancy reduction on compression efficiency varies depending on the video content. Downscaling or down-converting can positively or negatively affect compression performance, contingent upon the characteristics of the video. Additionally, the bit rate used for video encoding is a critical factor in adaptive encoding. Empirical results indicate that downscaling or down-converting at a low bit rate enhances the quality of the compressed video while maintaining the same overall bit rate. Consequently, we propose two spatial/temporal adaptive encoding methods: a machine learning approach and a CNN-based spatio-temporally adaptive encoder. Our machine learning method predicts the minimum quantization parameter (QP) at the bit rate intersection where encoding with downscaled video outperforms conventional encoding, leveraging hand-crafted features of frames. Meanwhile, the CNN-based adaptive encoding method predicts the QP at the intersection based on spatial and temporal features extracted by the CNN network. Both of our proposed methods surpass the performance of state-of-the-art adaptive encoding techniques. The CNN-based adaptive encoder particularly excels in intra-frame encoding compared to using hand-crafted features and machine learning. | en_US |
dc.language.iso | en | en_US |
dc.title | VIDEO ENCODING QUALITY AND BIT RATE PREDICTION, AND ITS APPLICATION IN RESOLUTION, AND FRAME-RATE ADAPTIVE ENCODING | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Electrical and Computer Engineering | en_US |
dc.description.degreetype | Thesis | en_US |
dc.description.degree | Doctor of Philosophy (PhD) | en_US |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Jenab_Maryam_2023_Aug_PhD.pdf | 8.03 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.