TinyML Inference Enablement and Acceleration on Microcontrollers The Case of Healthcare

Sun, Bailian

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/29934

Title:	TinyML Inference Enablement and Acceleration on Microcontrollers The Case of Healthcare
Authors:	Sun, Bailian
Advisor:	Hassan, Mohamed
Department:	Electrical and Computer Engineering
Publication Date:	2024
Abstract:	Controlling high blood pressure can eliminate more than half of the deaths caused by cardiovascular diseases (CVDs). Towards this target, continuous BP monitoring is a must. The existing Convolutional Neural Network (CNN) -based solutions rely on server-like infrastructure with huge computation and memory capabilities. This entails these solutions impractical with several security, privacy, reliability, and latency concerns. To address the challenges, an alternative solution has merged to conduct the machine learning algorithms into tiny devices. The unprecedented boom in tinyML development also drives the high relevance of optimizing network inference strategies on resource-constrained microcontrollers (MCUs) The contributions of the thesis are: First, the thesis contributes to the general field of tinyML by proposing novel techniques that enable the fitting of five popular CNNs - AlexNet, LeNet, SqueezeNet, ResNet, and MobileNet - into extremely-constrained edge devices with limited computation, memory, and power budget. The proposed techniques use a combination of novel architecture modifications, pruning, and quantization methods. Second, utilizing this stepping stone, the thesis proposes a tinyML-based solution to enable accurate and continuous BP estimation using only photoplethysmogram (PPG) signals. Third, the thesis proposes several techniques to accelerate the CNNs inference process. From a hardware perspective, we discuss architecture-aware accelerations with cache and multi-core specifications; from the software perspective, we develop application-aware optimizations with an existing real-time compatible C library to maximize the computation and intermediate buffer reuse. Those solutions only require the general MCU features thus demonstrating board generalization across various networks and devices. We conduct an extensive evaluation using thousands of real Intensive Care Unit (ICU) patient data and several tiny edge devices and all the five aforementioned CNNs. Results show comparable accuracy to server-based solutions. The proposed acceleration strategies achieve up to 71% reduction in inference latency.
URI:	http://hdl.handle.net/11375/29934
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
Sun_Bailian_202406_Master.pdf Open Access		1.39 MB	Adobe PDF	View/Open

Show full item record