The Spark 2.0 System | a Special Purpose Vector Processor with a Vectorpascal Compiler
نویسنده
چکیده
This paper describes the architecture of the Spark 2.0 processor and introduces a compiler for VectorPascal. Features of the architecture are the exible address generation during vector operations and the large memories closely connected to the functional units. The source language allows to write programs with vector statements avoiding scalar inner loops. The compiler employs several optimizing strategies to utilize the architectural beneets eeciently.
منابع مشابه
Hardware/Compiler Co-development for an Embedded Media Processor
Embedded and portable systems running multimedia applications create a new challenge for hardware architects. The microprocessor needed for such systems is a merged general-purpose processor and digital-signal processor, with the programmability the former and the performance and power budget of the latter. This paper presents the co-development of the instruction set, the hardware, and the com...
متن کاملA Methodology for Leveraging Reconfigurability in Domain Specific Languages
Special-purpose hardware can dramatically accelerate an application. However, designing special-purpose hardware is often prohibitively expensive in terms of manpower and time. This paper describes a methodology that uses reconfigurability to enable the efficient compilation of a class of domain specific languages. We present the methodology, a prototype compiler, and a 40Gb/sec network process...
متن کاملImproving Effective Bandwidth for Streams
Processor speeds are increasing so much faster than memory speeds that within a decade processors may spend most of their time waiting for data. The problem is already acute for computations that linearly traverse long streams of vector-like data. Although streaming computations lack the temporal locality of reference that makes caches effective, they have predictable access patterns. Since mos...
متن کاملDesign and Evaluation of Dynamic Access Ordering
Memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly for streaming computations such as scientific vector processing or multimedia (de)compression. Although these computations lack the temporal locality of reference that makes caches effective, they have predictable access patterns. Since most modern DRAM components support modes that make it ...
متن کاملSimty: generalized SIMT execution on RISC-V
We present Simty, a massively multi-threaded RISC-V processor core that acts as a proof of concept for dynamic inter-thread vectorization at the micro-architecture level. Simty runs groups of scalar threads executing SPMD code in lockstep, and assembles SIMD instructions dynamically across threads. Unlike existing SIMD or SIMT processors like GPUs or vector processors, Simty vectorizes scalar g...
متن کامل