Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/30655
Title: HEPPO: Hardware-Efficient Proximal Policy Optimization. A Universal Pipelined Architecture for Generalized Advantage Estimation
Authors: Taha, Hazem
Advisor: Abdelhadi, Ameer
Department: Electrical and Computer Engineering
Keywords: Reinforcement Learning (RL);Proximal Policy Optimization (PPO);Hardware Acceleration;Generalized Advantage Estimation (GAE);Embedded Systems;Resource-Constrained Environments;Custom Hardware Architectures;Quantization Techniques;FPGA-Based Acceleration;Deep Learning Optimization
Publication Date: 2024
Abstract: This thesis presents HEPPO: Hardware-Efficient Proximal Policy Optimization, a framework designed to address the computational and memory challenges associated with implementing advanced reinforcement learning algorithms on resource-constrained hardware platforms. By introducing dynamic standardization for rewards and an 8-bit quantization strategy, HEPPO reduces memory requirements by up to 75% while improving training stability and performance, achieving up to a 67% increase in cumulative rewards. A novel, highly parallelized architecture for Generalized Advantage Estimation (GAE) computation accelerates this critical phase, processing 19.2 billion elements per second using 64 processing elements, contributing to a 22% to 37% reduction in PPO training time in different environments. Adapting the proposed on-chip memory layout reduces the GAE data transfer latency and increases the reduction percentage up to 48% in certain environments in PPO training time. The integration of the entire PPO pipeline on a single System-on-Chip (SoC) further enhances system performance by reducing communication overhead and leveraging custom hardware acceleration. Experimental evaluations demonstrate that HEPPO effectively bridges the gap between sophisticated reinforcement learning algorithms and practical hardware implementations, enabling efficient deployment in embedded systems and real-time applications.
URI: http://hdl.handle.net/11375/30655
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
taha_hazem_a_202412_masc.pdf
Open Access
3.06 MBAdobe PDFView/Open
Show full item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue