Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/26456
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | He, Wenbo | - |
dc.contributor.author | Bo, Yang | - |
dc.date.accessioned | 2021-05-14T19:54:03Z | - |
dc.date.available | 2021-05-14T19:54:03Z | - |
dc.date.issued | 2021 | - |
dc.identifier.uri | http://hdl.handle.net/11375/26456 | - |
dc.description.abstract | Over the past decade, the research of deep learning has dramatically progressed and achieved overwhelming performance for various tasks. This success is highly dependent on a large number of manually labeled data. However, it is not always possible to collect enough training data, meanwhile manually labelling a large amount of data is labour-intensive. To learn from a limited number of examples with supervised information, a new machine learning paradigm called Few-Shot Learning (FSL) is introduced. For the few-shot human action recognition task, the core challenge is preserving both spatial and temporal information of the video with few labeled videos. Many approaches based on Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs) have been proposed to address the human action recognition task. However, these works fail to preserve the temporal information of the entire video and require a large number of training videos, hence directly applying them to solve the few-shot human action recognition would severely overfit. Currently, there are only a few approaches to address the few-shot human action recognition problem. They either focus on learning how to compare the similarity between video descriptors with few training samples or solving the training samples deficiency by data augmentation. In this thesis, we propose three approaches that preserve the temporal information of the entire video given the frame/segment features: the Discriminative Video Descriptor (DVD), Temporal Attention Vector (TAV) and Contents and Length based Temporal Attention (CLTA). These methods preserve the temporal information of the entire video by convolving frame features with the basis for small dimensional space recursively, aggregating the frame/segment features with temporal weights which are manually defined, aggregating the frame features with temporal weights which is provided by learned Gaussian distribution functions based on both length and content of the video, respectively. We evaluated our approaches on different datasets in various scenarios (regular or few-shot), our approaches achieve comparable or better results compared to the state-of-the-art approaches on different datasets. | en_US |
dc.language.iso | en | en_US |
dc.title | Temporal Aggregation Approaches for Few-shot Human Action Recognition in Videos | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Computing and Software | en_US |
dc.description.degreetype | Thesis | en_US |
dc.description.degree | Doctor of Science (PhD) | en_US |
dc.description.layabstract | The core challenge of human action recognition in video is generating the descriptor which preserves the spatial (the content in a single frame) and temporal (the correlation between the frames) information of the video. Various actions need a different number of frames to represent. For example, it is hard to distinguish action walking and running given a single frame. Current approaches adopt Deep Neural Networks (DNNs) to study the video descriptor. However, they fail to preserve the temporal information of the entire video and require a large number of training videos, hence cannot be directly used in the few-shot scenario. In this thesis, we propose three video descriptor generation approaches that preserve the temporal information of the entire video and only introduce a few training parameters. We show that our approaches achieve comparable or better performance compared to the state-of-the-art approaches for both regular and few-shot human action recognition tasks. | en_US |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Bo_Yang_202105_Doctor-of-Philosophy.pdf | 4.46 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.