Self-Supervised Masked Autoencoding Meets Federated Learning for Electric Vehicle Battery State-of-Health Estimation

Ismail, Mohanad

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/31980

Title:	Self-Supervised Masked Autoencoding Meets Federated Learning for Electric Vehicle Battery State-of-Health Estimation
Authors:	Ismail, Mohanad
Advisor:	Ahmed, Ryan
Department:	Mechanical Engineering
Keywords:	Electric Vehicle;State-of-Health Estimation;Self-Supervised Learning;Masked Autoencoding;Federated Learning;Cloud Computing;Edge Computing;Fine-Tuning Strategies;Masking Ratio Optimization;Data Scarcity and Heterogeneity;Privacy-Preserving Machine Learning
Publication Date:	2025
Abstract:	EVs live and die by their batteries. To keep drivers safe and confident in their vehicles, we need efficient, accurate, and private ways to track each battery's SoH. But, EV labelled data is scarce, sharing raw data raises privacy flags, and big models strain on-board hardware. This thesis tackles all three problems through a two-step remedy in one shot. 1. Learn data representations without needing labels: Each car trains a small autoencoder to reconstruct its own collected sensor data after randomly hiding parts of the signal. 2. Share knowledge, not data: Instead of uploading the raw collected data, every car sends only its trained model parameters to a remote cloud server. The server aggregates parameters from all cars and sends the improved model back. Four simple questions guide our work: 1. Does this usage of unlabelled data improve the model's performance? 2. How much of the signal should be hidden to get the best representation learning? 3. What is the optimal strategy for incorporating the limited labelled data available into the model? 4. Does this aggregation of separately trained models hurt accuracy compared with a fully centralized approach? Our experiments show a 17% lower average MAE, with up to a 60% improvement in the best cases, when we make use of the available unlabelled data versus training exclusively on labelled data. Hiding 30-40% of signals strikes the balance between challenge and clarity. Finally, aggregation of models on average stays within 0.05Ah of centralized training, virtually no loss, with zero raw-data exposure. This thesis incorporates cloud computing, SSL, and FL to present a light, privacy-friendly pipeline for fleet-wide SoH estimation, evidence that unfrozen fine-tuning outshines frozen variants, the first systematic look at how masking ratio shapes battery time-series representation learning, and practical proof that sharing model weights instead of data keeps accuracy basically untouched and privacy intact.
URI:	http://hdl.handle.net/11375/31980
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
ismail_mohanad_2025july_masc.pdf Open Access		15.74 MB	Adobe PDF	View/Open

Show full item record