Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

HEPPO: Hardware-Efficient Proximal Policy Optimization. A Universal Pipelined Architecture for Generalized Advantage Estimation

dc.contributor.advisorAbdelhadi, Ameer
dc.contributor.authorTaha, Hazem
dc.contributor.departmentElectrical and Computer Engineeringen_US
dc.date.accessioned2024-12-23T19:50:06Z
dc.date.available2024-12-23T19:50:06Z
dc.date.issued2024
dc.description.abstractThis thesis presents HEPPO: Hardware-Efficient Proximal Policy Optimization, a framework designed to address the computational and memory challenges associated with implementing advanced reinforcement learning algorithms on resource-constrained hardware platforms. By introducing dynamic standardization for rewards and an 8-bit quantization strategy, HEPPO reduces memory requirements by up to 75% while improving training stability and performance, achieving up to a 67% increase in cumulative rewards. A novel, highly parallelized architecture for Generalized Advantage Estimation (GAE) computation accelerates this critical phase, processing 19.2 billion elements per second using 64 processing elements, contributing to a 22% to 37% reduction in PPO training time in different environments. Adapting the proposed on-chip memory layout reduces the GAE data transfer latency and increases the reduction percentage up to 48% in certain environments in PPO training time. The integration of the entire PPO pipeline on a single System-on-Chip (SoC) further enhances system performance by reducing communication overhead and leveraging custom hardware acceleration. Experimental evaluations demonstrate that HEPPO effectively bridges the gap between sophisticated reinforcement learning algorithms and practical hardware implementations, enabling efficient deployment in embedded systems and real-time applications.en_US
dc.description.degreeMaster of Applied Science (MASc)en_US
dc.description.degreetypeThesisen_US
dc.description.layabstractReinforcement Learning (RL) enables agents to acquire knowledge and make decisions by interacting with their environment, similar to human experiential learning. RL has been widely used in various industrial domains such as health care and finance. However, the implementation of sophisticated RL algorithms on small, resource-limited devices such as embedded systems or edge devices is often difficult due to the fact that they require a significant amount of computational power and memory. This thesis presents HEPPO, a novel framework facilitating the efficient execution of the widely used reinforcement learning algorithm, Proximal Policy Optimization (PPO), across several hardware platforms. By optimizing critical bottlenecks of the algorithm and developing a customized hardware architecture, HEPPO markedly decreases computational requirements and memory consumption without compromising performance. The proposed framework enables the real time deployment of intelligent learning algorithms on devices such as drones, robots, and other smart systems, thereby augmenting their capabilities and facilitating new avenues for innovation across diverse industries.en_US
dc.identifier.urihttp://hdl.handle.net/11375/30655
dc.language.isoen_USen_US
dc.subjectReinforcement Learning (RL)en_US
dc.subjectProximal Policy Optimization (PPO)en_US
dc.subjectHardware Accelerationen_US
dc.subjectGeneralized Advantage Estimation (GAE)en_US
dc.subjectEmbedded Systemsen_US
dc.subjectResource-Constrained Environmentsen_US
dc.subjectCustom Hardware Architecturesen_US
dc.subjectQuantization Techniquesen_US
dc.subjectFPGA-Based Accelerationen_US
dc.subjectDeep Learning Optimizationen_US
dc.titleHEPPO: Hardware-Efficient Proximal Policy Optimization. A Universal Pipelined Architecture for Generalized Advantage Estimationen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
taha_hazem_a_202412_masc.pdf
Size:
2.99 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: