Remote Store Programming: Mechanisms and Performance
نویسندگان
چکیده
This paper presents remote store programming (RSP). This paradigm combines usability and efficiency through the exploitation of a simple hardware mechanism, the remote store, which can easily be added to existing multicores. Remote store programs are marked by fine-grained and one-sided communication which results in a stream of data flowing from the registers of a sending process to the cache of a destination process. The RSP model and its hardware implementation trade a relatively high store latency for a low load latency because loads are more common than stores, and it is easier to tolerate store latency than load latency. This paper demonstrates the performance advantages of remote store programming by comparing it to both cache-coherent shared memory and direct memory access (DMA) based approaches using the TILEPro64 processor. The paper studies two applications: a two-dimensional Fast Fourier Transform (2D FFT) and an H.264 encoder for high-definition video. For a 2D FFT using 56 cores, RSP is 1.64× faster than DMA and 4.4× faster than shared memory. For an H.264 encoder using 40 cores, RSP achieves the same performance as DMA and 4.8× the performance of shared memory. Along with these performance advantages, RSP requires the least hardware support of the three. RSP’s features, performance, and hardware simplicity make it well suited to the embedded processing domain.
منابع مشابه
Remote Store Programming A Memory Model for Embedded Multicore
This paper presents remote store programming (RSP), a programming paradigm which combines usability and efficiency through the exploitation of a simple hardware mechanism, the remote store, which can easily be added to existing multicores. The RSP model and its hardware implementation trade a relatively high store latency for a low load latency because loads are more common than stores, and it ...
متن کاملRemote Store Programming: Reflective Memory for Multicore
This work presents remote store programming (RSP), an instance of the reflective memory model designed to be incrementally supportable on multicores that support loads and stores. To demonstrate the value of RSP, its performance is compared to that of both shared and distributed memory approaches using the TILEPro64 multicore processor. RSP is shown to be as much as 1.76× faster than distribute...
متن کاملFaRM: Fast Remote Memory
We describe the design and implementation of FaRM, a new main memory distributed computing platform that exploits RDMA to improve both latency and throughput by an order of magnitude relative to state of the art main memory systems that use TCP/IP. FaRM exposes the memory of machines in the cluster as a shared address space. Applications can use transactions to allocate, read, write, and free o...
متن کاملHigher-order Distributed Computation over Autonomous Persistent Stores
The traditional approach for building distributed applications is by calling a procedure in another store using an RPC mechanism. However, an RPC requires a round-trip network delay for every call and makes each store dependent on the availability of other stores. A solution to this problem is to migrate the remote objects needed to the client store, and in particular the remote procedures them...
متن کاملElectrode Materials for Lithium Ion Batteries: A Review
Electrochemical energy storage systems are categorized into different types, according to their mechanisms, including capacitors, supercapacitors, batteries and fuel cells. All battery systems include some main components: anode, cathode, an aqueous/non-aqueous electrolyte and a membrane that separates anode and cathode while being permeable to ions. Being one of the key parts of any new electr...
متن کامل