Accelerating Object Detection and Tracking Pipelines for Efficient Edge Video Analytics

Xu, Renjie

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/32494

Title:	Accelerating Object Detection and Tracking Pipelines for Efficient Edge Video Analytics
Authors:	Xu, Renjie
Advisor:	Zheng, Rong Razavi, Saiedeh
Department:	Computing and Software
Keywords:	Edge Computing;Video Analytics;Object Detection;Object Tracking;Computer Vision;Deep Learning
Publication Date:	2025
Abstract:	Edge computing enables rapid video analytics by processing data closer to the source, thereby reducing end-to-end latency. This gives rise to the paradigm of edge video analytics (EVA). Object detection and object tracking are key building blocks of video analytics pipelines (VAPs), as their outputs directly impact the performance of downstream tasks. In real-world applications like traffic monitoring, timely and accurate responses are critical, as delayed or inaccurate results can compromise safety. However, achieving such an accuracy-efficiency balance at the edge is particularly challenging due to two main factors: the compute-intensive nature of modern Convolutional Neural Network (CNN)- or Vision Transformer (ViT)-based models, and the limited computational and communication resources on edge devices. This thesis aims to improve the efficiency of object detection and tracking pipelines without sacrificing accuracy, enabling efficient and reliable EVA. Conventional pipelines often adopt fixed configurations (e.g., frame resolution and backbone model) or process entire frames uniformly, overlooking the dynamic and spatially diverse nature of video content, resulting in considerable resource waste. To address these limitations, we propose three novel approaches: FastTuner, a model-agnostic framework that dynamically selects the optimal frame resolution and backbone model at runtime to accelerate multi-object tracking (MOT) pipelines; BlockHybrid, which leverages a policy network to classify each frame into “hard” and “easy” blocks, and processes them with either a block-wise detector or a lightweight tracker accordingly; and SEED, an end-to-end framework that couples block selection with block execution, enabling unified and efficient selection and execution of informative blocks in ViT-based object detectors. Extensive evaluations across multiple datasets and deployment scenarios demonstrate the effectiveness and generality of the proposed methods. Together, these contributions pave the way for more adaptive and scalable video analytics in real-world edge environments.
URI:	http://hdl.handle.net/11375/32494
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
Xu_Renjie_202509_PhD.pdf Open Access		7.16 MB	Adobe PDF	View/Open

Show full item record