Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/30641
Title: | Efficient Modeling for DNN Hardware Resiliency Assessment |
Other Titles: | EFFICIENT MODELING FOR DNN HARDWARE RESILIENCY ASSESSMENT |
Authors: | Mahmoud, Karim |
Advisor: | Nicolici, Nicola |
Department: | Electrical and Computer Engineering |
Keywords: | Hardware Fault Assessment; Integer Arithmetic Circuits; Deep Neural Networks; Machine Learning Hardware Accelerators |
Publication Date: | 2025 |
Abstract: | Deep neural network (DNN) hardware accelerators are critical enablers of the current resurgence in machine learning technologies. Adopting machine learning in safety-critical systems imposes additional reliability requirements on hardware design. Addressing these requirements mandates an accurate assessment of the impact caused by permanent faults in the processing engines (PE). Carrying out this reliability assessment early in the design process allows for addressing potential reliability concerns when it is less costly to perform design revisions. However, the large size of modern DNN hardware and the complexity of the DNN applications running on it present barriers to efficient reliability evaluation before proceeding with the design implementation. Considering these barriers, this dissertation proposes two methodologies to assess fault resiliency in integer arithmetic units in DNN hardware. Using the information from the data streaming patterns of the DNN accelerators, which are known before the register-transfer level (RTL) implementation, the first methodology enables fault injection experiments to be carried out in PE units at the pre-RTL stage during architectural design space exploration. This is achieved in a DNN simulation framework that captures the mapping between a model's operations and the hardware's arithmetic units. This facilitates a fault resiliency comparison of state-of-the-art DNN accelerators comprising thousands of PE units. The second methodology introduces accurate and efficient modelling of the impact of permanent faults in integer multipliers. It avoids the need for computationally intensive circuit models, e.g., netlists, to inject faults in integer arithmetic units, thus scaling the fault resiliency assessment to accelerators with thousands of PE units with negligible simulation time overhead. As a first step, we formally analyze the impact of permanent faults affecting the internal nodes of two integer multiplier architectures. This analysis indicates that, for most internal faults, the impact on the output is independent of the operands involved in the arithmetic operation. As the second step, we develop a statistical fault injection approach based on the likelihood of a fault being triggered in the applications that run on the target DNN hardware. By modelling the impact of faults in internal nodes of arithmetic units using fault-free operations, fault injection campaigns run three orders of magnitude faster than using arithmetic circuit models in the same simulation environment. The experiments also show that the proposed method's accuracy is on par with that of using netlists to model arithmetic circuitry in which faults are injected. Using the proposed methods, one can conduct fault assessment experiments for various DNN models and hardware architectures, examining the sensitivity of DNN model-related and hardware architecture-related features on the DNN accelerator's reliability. In addition to understanding the impact of permanent hardware faults on the accuracy of DNN models running on defective hardware, the outcomes of these experiments can yield valuable insights for designers seeking to balance fault criticality and performance, thereby facilitating the development of more reliable DNN hardware in the future. |
URI: | http://hdl.handle.net/11375/30641 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Mahmoud_Karim_OR_2024December_PhD.pdf | 9.62 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.