Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/29934
Title: | TinyML Inference Enablement and Acceleration on Microcontrollers The Case of Healthcare |
Authors: | Sun, Bailian |
Advisor: | Hassan, Mohamed |
Department: | Electrical and Computer Engineering |
Publication Date: | 2024 |
Abstract: | Controlling high blood pressure can eliminate more than half of the deaths caused by cardiovascular diseases (CVDs). Towards this target, continuous BP monitoring is a must. The existing Convolutional Neural Network (CNN) -based solutions rely on server-like infrastructure with huge computation and memory capabilities. This entails these solutions impractical with several security, privacy, reliability, and latency concerns. To address the challenges, an alternative solution has merged to conduct the machine learning algorithms into tiny devices. The unprecedented boom in tinyML development also drives the high relevance of optimizing network inference strategies on resource-constrained microcontrollers (MCUs) The contributions of the thesis are: First, the thesis contributes to the general field of tinyML by proposing novel techniques that enable the fitting of five popular CNNs - AlexNet, LeNet, SqueezeNet, ResNet, and MobileNet - into extremely-constrained edge devices with limited computation, memory, and power budget. The proposed techniques use a combination of novel architecture modifications, pruning, and quantization methods. Second, utilizing this stepping stone, the thesis proposes a tinyML-based solution to enable accurate and continuous BP estimation using only photoplethysmogram (PPG) signals. Third, the thesis proposes several techniques to accelerate the CNNs inference process. From a hardware perspective, we discuss architecture-aware accelerations with cache and multi-core specifications; from the software perspective, we develop application-aware optimizations with an existing real-time compatible C library to maximize the computation and intermediate buffer reuse. Those solutions only require the general MCU features thus demonstrating board generalization across various networks and devices. We conduct an extensive evaluation using thousands of real Intensive Care Unit (ICU) patient data and several tiny edge devices and all the five aforementioned CNNs. Results show comparable accuracy to server-based solutions. The proposed acceleration strategies achieve up to 71% reduction in inference latency. |
URI: | http://hdl.handle.net/11375/29934 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Sun_Bailian_202406_Master.pdf | 1.39 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.