Stream-Based Intelligent Memory Architectures for High-Performance and Predictable Real-Time Systems
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Emerging cyber-physical platforms such as autonomous vehicles and unmanned aerial
systems increasingly integrate high-throughput workloads (e.g., perception and machine
learning) with safety-critical real-time control on the same multicore, multi-channel Dynamic Random Access Memory (DRAM)-based system. While processor cores continue to scale, the memory system remains a major bottleneck: conventional hardware
prefetchers and memory controllers are largely oblivious to program structure, leading
to poor bandwidth utilization, high energy, and highly variable memory access latency
that undermines real-time guarantees.
This dissertation proposes a unified Hardware/Software (HW/SW) interface that
leverages software-provided information, known prior to execution, to describe future
memory access behavior to the hardware. A stream defines the underlying large, array-like data structure over which this access behavior, whether regular or irregular, occurs.
Using compact stream descriptors, the software communicates future access sequences
to a centralized hardware engine, which tags last-level cache misses and coordinates
stream-aware optimizations across the memory hierarchy.
Leveraging this interface, the thesis introduces three architectures: First, InterStellar
and its multi-channel extension InterStellar 2.0 implement stream-aware DRAM controllers that perform intelligent page management and proactive DRAM-aware batching, substantially improving effective bandwidth and reducing row conflicts. Second,
InterStellarRT adapts the same principles to real-time systems by forming analyzable
real-time batches and applying a predictable scheduling policy, enabling tight worst-case memory-latency bounds for stream-based memory patterns. Third, COMPASS
co-designs a stream-aware last-level cache prefetcher with a stream-aware memory controller, coordinating prefetch issuance with DRAM batching to reduce effective miss
latency while sustaining high throughput.
Evaluated across a broad set of scientific and high-performance computing workloads, these three architectures deliver substantial performance and energy improvements over state-of-the-art baselines. InterStellarRT, in addition, provides significantly
tighter and formally analyzable worst-case latency bounds compared to contemporary
real-time memory controllers. Collectively, the contributions demonstrate that stream-based memory intelligence is an effective approach to mitigating the memory-system
vibottleneck in modern multicore platforms that integrate cache prefetching mechanisms
and multi-channel DRAM subsystems
The implementations of InterStellar 2.0, InterStellarRT, and COMPASS are available
in the project repository: https://gitlab.com/fanosteam/fanosgem5.
Each architecture is provided in a separate branch:
1) InterStellar 2.0 :
https://gitlab.com/fanosteam/fanosgem5/-/tree/InterStellar-2.0
2) InterStellarRT
https://gitlab.com/fanosteam/fanosgem5/-/tree/InterStellarRT
3) COMPASS
https://gitlab.com/fanosteam/fanosgem5/-/tree/COMPASS