Cache Aware Scheduling for Synchronous Dataflow Programs
نویسندگان
چکیده
The Synchronous Dataflow (SDF) model of computation [1] is an efficient and popular way to represent signal processing systems. In an SDF model, the amount of data produced and consumed by a data flow actor is specified a priori for each input and output. SDF specifications allow static generation of highly optimized schedules, which may be optimized according to one or more criteria, such as minimum buffer size, maximum throughput, maximum processor utilization, or minimum program memory. In this report, we analyze the effect of cache architecture on the execution time of an SDF schedule and develop a new heuristic approach to generating SDF schedules with reduced execution time for a particular cache architecture. In this report, we consider the implementation of well-ordered SDF graphs on a single embedded Digital Signal Processor (DSP). We assume a simple Harvard memory architecture DSP with single-level caches and separate instruction and data-memory. In order to predict execution times, we propose a cache management policy for the data cache and argue that this policy outperforms traditional cache policies when executing SDF models. We also replace the instruction cache by a scratchpad memory with software-controlled replacement policy. Using our data cache and instruction scratchpad policies, we show that different schedules can have vastly different execution times for a given set of data cache and instruction scratchpad sizes. In addition, we show that existing scheduling techniques often create schedules that perform poorly with respect to cache usage. 3 In order to improve cache performance, an optimal cache-aware scheduler would minimize the total cache miss penalty by simultaneously considering both data and instruction miss penalties. Unfortunately, reducing data cache misses often increases instruction scratchpad misses and vice versa. In this report, we show that the number of schedules that must be considered increases exponentially according to the vectorization factor of the schedule. To address this complexity, we develop an SDF scheduling algorithm based on a greedy, cache-aware heuristic. We compare the resulting schedules with schedules generated by existing SDF scheduling schemes. The schedule generated by our algorithm poses an interesting problem of code generation. We also propose a solution to address this problem. This work is highly applicable in the design of SDF systems that are implemented as Systems on Chip (SoC) with DSP cores.
منابع مشابه
Execution And Cache Performance Of A Decoupled Non-Blocking Multithreaded Architecture
In this paper we will present an evaluation of the execution performance and cache behavior of a new multithreaded architecture being investigated by the authors. Our architecture uses non-blocking multithreaded model based on dataflow paradigm. In addition, all memory accesses are decoupled from the thread execution. Data is pre-loaded into the thread context (registers), and all results are p...
متن کاملCompositionality in dataflow synchronous languages : specification & distributed code generation ∗ † ‡ Albert
Modularity is advocated as a solution for the design of large systems, the mathematical translation of this concept is often that of compositionality. This paper is devoted to the issues of compositionality for modular code generation, in dataflow synchronous languages. As careless reuse of object code in new or evolving system designs fails to work, we first concentrate on what are the additio...
متن کاملCompositionality in Dataflow Synchronous Languages: Specification and Distributed Code Generation
Modularity is advocated as a solution for the design of large systems, the mathematical translation of this concept is often that of compositionality. This paper is devoted to the issues of compositionality for modular code generation, in data ow synchronous languages. As careless reuse of object code in new or evolving system designs fails to work, we rst concentrate on what are the additional...
متن کاملExecution and Cache Performance of the Scheduled Dataflow Architecture
This paper presents an evaluation of our Scheduled Dataflow (SDF) Processor. Recent focus in the field of new processor architectures is mainly on VLIW (e.g. IA-64), superscalar and superspeculative architectures. This trend allows for better performance at the expense of an increased hardware complexity and a brute-force solution to the memory-wall problem. Our research substantially deviates ...
متن کاملCAPS: Contention-Aware Proactive Scheduling for CMPs
Many Chip Multiprocessors (CMPs) rely on shared caches to hide the latency of inter-thread communications as well as to improve effective memory bandwidth. Yet along comes cache contention, which often results in cache thrashing and severe performance degradation. Because of the variety of programs, a suitable schedule can often alleviate the issues significantly. However, it remains an open qu...
متن کامل