Temporal Aggregation Approaches for Few-shot Human Action Recognition in Videos

Bo, Yang

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/26456

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	He, Wenbo	-
dc.contributor.author	Bo, Yang	-
dc.date.accessioned	2021-05-14T19:54:03Z	-
dc.date.available	2021-05-14T19:54:03Z	-
dc.date.issued	2021	-
dc.identifier.uri	http://hdl.handle.net/11375/26456	-
dc.description.abstract	Over the past decade, the research of deep learning has dramatically progressed and achieved overwhelming performance for various tasks. This success is highly dependent on a large number of manually labeled data. However, it is not always possible to collect enough training data, meanwhile manually labelling a large amount of data is labour-intensive. To learn from a limited number of examples with supervised information, a new machine learning paradigm called Few-Shot Learning (FSL) is introduced. For the few-shot human action recognition task, the core challenge is preserving both spatial and temporal information of the video with few labeled videos. Many approaches based on Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs) have been proposed to address the human action recognition task. However, these works fail to preserve the temporal information of the entire video and require a large number of training videos, hence directly applying them to solve the few-shot human action recognition would severely overfit. Currently, there are only a few approaches to address the few-shot human action recognition problem. They either focus on learning how to compare the similarity between video descriptors with few training samples or solving the training samples deficiency by data augmentation. In this thesis, we propose three approaches that preserve the temporal information of the entire video given the frame/segment features: the Discriminative Video Descriptor (DVD), Temporal Attention Vector (TAV) and Contents and Length based Temporal Attention (CLTA). These methods preserve the temporal information of the entire video by convolving frame features with the basis for small dimensional space recursively, aggregating the frame/segment features with temporal weights which are manually defined, aggregating the frame features with temporal weights which is provided by learned Gaussian distribution functions based on both length and content of the video, respectively. We evaluated our approaches on different datasets in various scenarios (regular or few-shot), our approaches achieve comparable or better results compared to the state-of-the-art approaches on different datasets.	en_US
dc.language.iso	en	en_US
dc.title	Temporal Aggregation Approaches for Few-shot Human Action Recognition in Videos	en_US
dc.type	Thesis	en_US
dc.contributor.department	Computing and Software	en_US
dc.description.degreetype	Thesis	en_US
dc.description.degree	Doctor of Science (PhD)	en_US
dc.description.layabstract	The core challenge of human action recognition in video is generating the descriptor which preserves the spatial (the content in a single frame) and temporal (the correlation between the frames) information of the video. Various actions need a different number of frames to represent. For example, it is hard to distinguish action walking and running given a single frame. Current approaches adopt Deep Neural Networks (DNNs) to study the video descriptor. However, they fail to preserve the temporal information of the entire video and require a large number of training videos, hence cannot be directly used in the few-shot scenario. In this thesis, we propose three video descriptor generation approaches that preserve the temporal information of the entire video and only introduce a few training parameters. We show that our approaches achieve comparable or better performance compared to the state-of-the-art approaches for both regular and few-shot human action recognition tasks.	en_US
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
Bo_Yang_202105_Doctor-of-Philosophy.pdf Open Access		4.46 MB	Adobe PDF	View/Open

Show simple item record