Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture
نویسندگان
چکیده
The bit-reversed permutation is a famous task in signal processing and is key to efficient implementation of the fast Fourier transform. This paper presents optimized C++11 implementations of five extant methods for computing the bit-reversed permutation: Stockham auto-sort, naive bitwise swapping, swapping via a table of reversed bytes, local pairwise swapping of bits, and swapping via a cache-localized matrix buffer. Three new strategies for performing the bit-reversed permutation in C++11 are proposed: an inductive method using the bitwise XOR operation, a template-recursive closed form, and a cache-oblivious template-recursive approach, which reduces the bit-reversed permutation to smaller bit-reversed permutations and a square matrix transposition. These new methods are compared to the extant approaches in terms of theoretical runtime, empirical compile time, and empirical runtime. The template-recursive cache-oblivious method is shown to be competitive with the fastest known method; however, we demonstrate that the cache-oblivious method can more readily benefit from parallelization on multiple cores and on the
منابع مشابه
CBEAM: Efficient Authenticated Encryption from Feebly One-Way ϕ Functions
We show how efficient and secure cryptographic mixing functions can be constructed from low-degree rotation-invariant φ functions rather than conventional S-Boxes. These novel functions have surprising properties; many exhibit inherent feeble (Boolean circuit) one-wayness and offer speed/area tradeoffs unobtainable with traditional constructs. Recent theoretical results indicate that even if th...
متن کاملPerformance Characterization of the 64-bit x86 Architecture from Compiler Optimizations' Perspective
Intel Extended Memory 64 Technology (EM64T) and AMD 64-bit architecture (AMD64) are emerging 64-bit x86 architectures that are fully x86 compatible. Compared with the 32-bit x86 architecture, the 64-bit x86 architectures cater some new features to applications. For instance, applications can address 64 bits of virtual memory space, perform operations on 64-bit-wide operands, get access to 16 ge...
متن کاملArchitectural Enhancements for Fast Subword Permutations with Repetitions in Cryptographic Applications
We propose two new instructions, swperm and sieve, that can be used to efficiently complete an arbitrary bit-level permutation of an n-bit word with or without repetitions. Permutations with repetitions are rearrangements of an ordered set in which elements may replace other elements in the set; such permutations are useful in cryptographic algorithms. On a 4-way superscalar processor, an arbit...
متن کاملEfficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields
This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...
متن کاملPorting Linux to x86-64
x86-64 is a 64-bit extension for the IA32 architecture, which is supported by the next generation of AMD CPUs. New features include 64-bit pointers, a 48-bit address space, 16 general purpose 64-bit integer registers, 16 SSE (Streaming SIMD Extensions) registers, and a compatibility mode to support old binaries. The Linux kernel port to x86-64 is based on the existing IA32 port with some extens...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1708.01873 شماره
صفحات -
تاریخ انتشار 2017