An O(1) Time Complexity Software Barrier
نویسندگان
چکیده
As network latency rapidly approaches thousands of processor cycles and multiprocessors systems become larger and larger, the primary factor in determining a barrier algorithm’s performance is the number of serialized network latencies it requires. All existing barrier algorithms require at least round trip message latencies to perform a single barrier operation on an -node shared memory multiprocessor. In addition, existing barrier algorithms are not well tuned in terms of how they interact with modern shared memory systems, which leads to an excessive number of message exchanges to signal barrier completion. The contributions of this paper are threefold. First, we identify and quantitatively analyze the performance deficiencies of conventional barrier implementations when they are executed on real (non-idealized) hardware. Second, we propose a queue-based barrier algorithm that has effectively time complexity as measured in round trip message latencies. Third, by exploiting a hardware write-update (PUT) mechanism for signaling, we demonstrate how carefully matching the barrier implementation to the way that modern shared memory systems operate can improve performance dramatically. The resulting optimized algorithm only costs one round trip message latency to perform a barrier operation across processors. Using a cycle-accurate execution-driven simulator of a future-generation SGI multiprocessor, we show that the proposed queue-based barrier outperforms conventional barrier implementations based on load-linked/storeconditional instructions by a factor of 5.43 (on 4 processors) to 93.96 (on 256 processors).
منابع مشابه
A polynomial-time algorithm for linear optimization based on a new simple kernel function
We present a new barrier function, based on a kernel function with a linear growth term and an inverse linear barrier term. Existing kernel functions have a quadratic (or higher degree) growth term, and a barrier term that is either transcendent (e.g. logarithmic) or of a more complicated algebraic form. So the new kernel function has the simplest possible form compared with all existing kernel...
متن کاملMeasurement of Complexity and Comprehension of a Program Through a Cognitive Approach
The inherent complexity of the software systems creates problems in the software engineering industry. Numerous techniques have been designed to comprehend the fundamental characteristics of software systems. To understand the software, it is necessary to know about the complexity level of the source code. Cognitive informatics perform an important role for better understanding the complexity o...
متن کاملA POLYNOMIAL-TIME PRIMAL-DUAL AFFINE SCALING ALGORITHM FOR LINEAR AND CONVEX QUADRATIC PROGRAMMING AND ITS POWER SERIES EXTENSION*t
We describe an algorithm for linear and convex quadratic programming problems that uses power series approximation of the weighted barrier path that passes through the current iterate in order to find the next iterate. If r > 1 is the order of approximation used, we show that our algorithm has time complexity O(n t(+l/r)L(l+l/r)) iterations and O(n3 + n2r) arithmetic operations per iteration, w...
متن کاملSymmetry Breaking in the Congest Model: Time- and Message-Efficient Algorithms for Ruling Sets
We study local symmetry breaking problems in the Congest model, focusing on ruling set problems, which generalize the fundamental Maximal Independent Set (MIS) problem. The time (round) complexity of MIS (and ruling sets) have attracted much attention in the Local model. Indeed, recent results (Barenboim et al., FOCS 2012, Ghaffari SODA 2016) for the MIS problem have tried to break the long-sta...
متن کاملFully Dynamic Transitive Closure: Breaking Through the O(n2) Barrier
In this paper we introduce a general framework for casting fully dynamic transitive closure into the problem of reevaluating polynomials over matrices. With this technique, we improve the best known bounds for fully dynamic transitive closure. In particular, we devise a deterministic algorithm for general directed graphs that achieves O(n2) amortized time for updates, while preserving unit wors...
متن کامل