Load Value Approximation: Approaching the Ideal Memory Access Latency
نویسندگان
چکیده
Approximate computing recognizes that many applications can tolerate inexactness. These applications, which range from multimedia processing to machine learning, operate on inherently noisy and imprecise data. As a result, we can tradeoff some loss in output value integrity for improved processor performance and energy-efficiency. In this paper, we introduce load value approximation. In modern processors, upon a load miss in the private cache, the data must be retrieved from main memory or from the higher-level caches. These data accesses are costly both in terms of latency and energy. We implement load value approximators, which are hardware structures that learn value patterns and generate approximations of the data. The processor can then use these approximate data values to continue executing without incurring the high cost of accessing memory. We show that load value approximators can achieve high coverage while maintaining very low error in the application’s output. By exploiting the approximate nature of applications, we can draw closer to the ideal memory access
منابع مشابه
Reducing Memory Latency via Read-after-Read Memory Dependence Prediction
ÐWe observe that typical programs exhibit highly regular read-after-read (RAR) memory dependence streams. To exploit this regularity, we introduce read-after-read (RAR) memory dependence prediction. This technique predicts whether: 1) A load will access a memory location that a preceding load accesses and 2) exactly which this preceding load is. This prediction is done without actual knowledge ...
متن کاملExploiting Load Latency Tolerance in Dynamically Scheduled Processors
This paper provides quantitative measurements of load latency tolerance in a dynamically scheduled processor and presents one cache management technique that exploits this information to improve overall performance. We determine the latency of each memory load operation such that the number of instructions issued per cycle (IPC) is comparable to an ideal memory system that satisfies all request...
متن کاملImproving Context-Based Load Value Prediction
Microprocessors are becoming faster at such a rapid pace that other components like random access memory cannot keep up. As a result, the latency of load instructions grows constantly and already often impedes processor performance. Fortunately, load instructions frequently fetch predictable sequences of values. Load value predictors exploit this behavior to predict the results of load instruct...
متن کاملAddress-free memory access based on program syntax correlation of loads and stores
An increasing cache latency in next-generation processors incurs profound performance impacts in spite of advanced out-of-order execution techniques. One way to circumvent this cache latency problem is to predict load values at the onset of pipeline execution by exploiting either the load value locality or the address correlation of stores and loads. In this paper, we describe a new load value ...
متن کاملSymbolic Cache: Fast Memory Access Based on Program Syntax Correlation of Loads and Stores
An increasing cache latency in next-generation processors incurs profound performance impacts in spite of advanced out-of-order execution techniques. One way to circumvent this cache latency problem is to predict load values at the onset of pipeline execution by exploiting either the load value locality or the address correlation of stores and loads. In this paper, we describe a new load value ...
متن کامل