Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/30640
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorNicolici, Nicola-
dc.contributor.authorPogue, Trevor E.-
dc.date.accessioned2024-12-16T15:22:48Z-
dc.date.available2024-12-16T15:22:48Z-
dc.date.issued2025-06-
dc.identifier.urihttp://hdl.handle.net/11375/30640-
dc.description.abstractThe field of deep learning has seen increasing breakthroughs and commercial adoption in recent years for enabling a wide range of applications including image and speech recognition, multimedia generation, information summarization, and human-like chatbots. This has led to a growing need for hardware that can quickly and efficiently perform deep learning inference, which increasingly requires massive amounts of computational power. To address this need, recent years have seen many works for optimizing deep learning inference in hardware. Systolic arrays are an efficient class of hardware designs to use as a starting point for this application. However, after hardware-oriented deep learning model optimizations reach their limits, after the known parallelism for executing their compute patterns in hardware is exhausted, and after technology scaling slows to a halt, there is an accelerator wall that limits further improvement on the implementation side. In this thesis, we contribute to this field through an under-explored direction by presenting new efficient matrix multiplication algorithms and/or their systolic-array hardware architectures that increase performance-per-area by reducing the workload at the algebraic level, and thus by computing the same result from a re-arranged compute pattern requiring fewer or cheaper operations to be performed in hardware. We evaluate our architectures in an end-to-end deep learning accelerator, demonstrating their ability to increase the performance-per-area of hardware accelerators beyond their normal theoretical limits.en_US
dc.language.isoenen_US
dc.subjecthardwareen_US
dc.subjectaccelerationen_US
dc.subjectarchitectureen_US
dc.subjectperformanceen_US
dc.subjectalgorithmsen_US
dc.subjectmatrix multiplicationen_US
dc.subjectmachine learningen_US
dc.subjectartificial intelligenceen_US
dc.titleAlgebraic Enhancements for Systolic Arraysen_US
dc.typeThesisen_US
dc.contributor.departmentElectrical and Computer Engineeringen_US
dc.description.degreetypeThesisen_US
dc.description.degreeDoctor of Philosophy (PhD)en_US
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
pogue_trevor_e_2024december_phd.pdf
Open Access
2.68 MBAdobe PDFView/Open
Show simple item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue