VIDEO ENCODING QUALITY AND BIT RATE PREDICTION, AND ITS APPLICATION IN RESOLUTION, AND FRAME-RATE ADAPTIVE ENCODING

Jenab, Maryam

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/28917

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Shirani, Shahram	-
dc.contributor.author	Jenab, Maryam	-
dc.date.accessioned	2023-09-21T14:52:44Z	-
dc.date.available	2023-09-21T14:52:44Z	-
dc.date.issued	2023	-
dc.identifier.uri	http://hdl.handle.net/11375/28917	-
dc.description.abstract	The prediction of perceived quality of images by the human visual system (HVS) has gained considerable interest. The HVS serves as the ultimate destination for most videos. However, prior to reaching its destination, an image undergoes degradation through compression and transmission. Given the bitrate limitations associated with video transmission and storage, video compression plays a crucial role in image communication. In this thesis, we present a novel convolution neural network (CNN)-based method for predicting the distortion of encoded video without performing compression. We have employed strategies to overcome the limitation of the dataset size. First, instead of employing scored samples based on mean opinion scores (MOS), we utilize a closely related index, Video Multimethod Assessment Fusion (VMAF), which aligns with HVS perceived quality scores and is easier to generate compared to MOS. Second, we train our CNN at patch (square area in a frame) level to increase the number of training samples and enhance the prediction accuracy. The patch-level quality predictor comprises a deep neural network (DNN) network consisting of a series of CNN and pooling layers, followed by a regressor with fully connected layers. This CNN network accepts patches of uncompressed video frames and patches from motion estimation (ME) maps as input and generates quality scores (VMAF) for the patches. We have introduced and compared three patch-wise to frame-wise transformations for frame-level quality prediction. We proposed a method for predicting perceived quality, both for intra-frames and inter-frames. The results demonstrate the excellent performance of our innovative frame-level compression quality prediction method. Rate control (RC) plays a crucial role in compression algorithms by minimizing distortion while adhering to bit rate constraints. RC operates by bit allocation at two levels: block or frame. Despite advancements in compression algorithms, conventional RC models still struggle with assigning bit rates to frames with fast motions and scene changes. To address this issue, we propose a novel CNN-based method for bit rate prediction, which overcomes the limitations of traditional RC models. Our approach consists of two phases: patch-level and frame-level bit rate prediction. The proposed CNN network includes CNN layers for extracting spatial and temporal features from video frames, and pooling layers to prevent overfitting. These CNN and pooling layers are followed by a regressor that predicts the bit rate of patches based on their extracted features. We utilize the trained patch-wise CNN bit rate predictor for frame-level bit rate prediction by feeding the extracted features of patches into the regressor. For intra-frame bit rate prediction, we employ frame patches to extract spatial features. For inter-frame bit rate prediction, in addition to spatial features, we incorporate motion estimation (ME) maps and extract temporal patch features. Notably, our proposed method is the first end-to-end CNN-based RC method that operates without relying on hand-crafted features. By not considering the previous frames' encoding information, such as their bit rates, our approach successfully predicts the bit rate even during scene changes. Previous research has demonstrated that reducing spatial and temporal redundancy before encoding can improve compression performance. Essentially, if a video frame is downscaled before encoding and upscaled after decoding, the resulting frame exhibits higher quality compared to a conventionally encoded frame at the same bit rate. In a temporally adaptive encoder, the video's frame rate is down-converted before encoding and up-converted after decoding. However, the impact of redundancy reduction on compression efficiency varies depending on the video content. Downscaling or down-converting can positively or negatively affect compression performance, contingent upon the characteristics of the video. Additionally, the bit rate used for video encoding is a critical factor in adaptive encoding. Empirical results indicate that downscaling or down-converting at a low bit rate enhances the quality of the compressed video while maintaining the same overall bit rate. Consequently, we propose two spatial/temporal adaptive encoding methods: a machine learning approach and a CNN-based spatio-temporally adaptive encoder. Our machine learning method predicts the minimum quantization parameter (QP) at the bit rate intersection where encoding with downscaled video outperforms conventional encoding, leveraging hand-crafted features of frames. Meanwhile, the CNN-based adaptive encoding method predicts the QP at the intersection based on spatial and temporal features extracted by the CNN network. Both of our proposed methods surpass the performance of state-of-the-art adaptive encoding techniques. The CNN-based adaptive encoder particularly excels in intra-frame encoding compared to using hand-crafted features and machine learning.	en_US
dc.language.iso	en	en_US
dc.title	VIDEO ENCODING QUALITY AND BIT RATE PREDICTION, AND ITS APPLICATION IN RESOLUTION, AND FRAME-RATE ADAPTIVE ENCODING	en_US
dc.type	Thesis	en_US
dc.contributor.department	Electrical and Computer Engineering	en_US
dc.description.degreetype	Thesis	en_US
dc.description.degree	Doctor of Philosophy (PhD)	en_US
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
Jenab_Maryam_2023_Aug_PhD.pdf Open Access		8.03 MB	Adobe PDF	View/Open

Show simple item record