Super Scalar Sample Sort
نویسندگان
چکیده
Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as the best practical comparison based sorting algorithm for distributed memory parallel computers. We show that sample sort is also useful on a single processor. The main algorithmic insight is that element comparisons can be decoupled from expensive conditional branching using predicated instructions. This transformation facilitates optimizations like loop unrolling and software pipelining. The final implementation, albeit cache efficient, is limited by a linear number of memory accesses rather than the O(n log n) comparisons. On an Itanium 2 machine, we obtain a speedup of up to 2 over std::sort from the GCC STL library, which is known as one of the fastest available quicksort implementations.
منابع مشابه
BlockQuicksort: Avoiding Branch Mispredictions in Quicksort
Since the work of Kaligosi and Sanders (2006), it is well-known that Quicksort – which is commonly considered as one of the fastest in-place sorting algorithms – suffers in an essential way from branch mispredictions. We present a novel approach to address this problem by partially decoupling control from data flow: in order to perform the partitioning, we split the input in blocks of constant ...
متن کاملBlockQuicksort: How Branch Mispredictions don't affect Quicksort
Since the work of Kaligosi and Sanders (2006), it is well-known that Quicksort – which is commonly considered as one of the fastest in-place sorting algorithms – suffers in an essential way from branch mispredictions. We present a novel approach to address this problem by partially decoupling control from data flow: in order to perform the partitioning, we split the input in blocks of constant ...
متن کاملVector Sd-rom Filter for Removal of Impulse Noise from Color Images
One well-studied image processing task is the removal of impulse noise from images. Impulse noise can be introduced during image capture, during transmission, or during storage. The signal-dependent rank order mean (SD-ROM) filter has been shown to be effective at removing impulses from 2-D scalar-valued signals while preserving many details and other features. The algorithm is based on a state...
متن کاملSuper Scalar Processor using Chip Level Optical Interconnections
In this paper we present the design of a super-scalar processor constructed using optoelectronic components interconnected via high-speed free-space optical buses.
متن کاملSuper-Scalar Processor Design
A super-scalar processor is one that is capable of sustaining an instruction-execution rate of more than one instruction per clock cycle. Maintaining this execution rate is primarily a problem of scheduling processor resources (such as functional units) for high utilrzation. A number of scheduling algorithms have been published, with wide-ranging claims of performance over the single-instructio...
متن کامل