Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

Algebraic Enhancements for Systolic Arrays

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The field of deep learning has seen increasing breakthroughs and commercial adoption in recent years for enabling a wide range of applications including image and speech recognition, multimedia generation, information summarization, and human-like chatbots. This has led to a growing need for hardware that can quickly and efficiently perform deep learning inference, which increasingly requires massive amounts of computational power. To address this need, recent years have seen many works for optimizing deep learning inference in hardware. Systolic arrays are an efficient class of hardware designs to use as a starting point for this application. However, after hardware-oriented deep learning model optimizations reach their limits, after the known parallelism for executing their compute patterns in hardware is exhausted, and after technology scaling slows to a halt, there is an accelerator wall that limits further improvement on the implementation side. In this thesis, we contribute to this field through an under-explored direction by presenting new efficient matrix multiplication algorithms and/or their systolic-array hardware architectures that increase performance-per-area by reducing the workload at the algebraic level, and thus by computing the same result from a re-arranged compute pattern requiring fewer or cheaper operations to be performed in hardware. We evaluate our architectures in an end-to-end deep learning accelerator, demonstrating their ability to increase the performance-per-area of hardware accelerators beyond their normal theoretical limits.

Description

Citation

Endorsement

Review

Supplemented By

Referenced By