2 Shared Memory Implementation

نویسندگان

  • Razvan Carbunescu
  • Andrew Gearhart
  • Mehrzad Tartibi
چکیده

For n particles, the original code requiresO(n) time because at each time step, the apply force function is called for each particle (to be updated) with each other particle. Since interactions are local, one particle is influenced by a few nearby particles. Thus, most of the time, the apply force function is computing a distance and returning after finding this distance to be greater than the local force cutoff. Since the density is constant (the size of the domain grows with n), the total number of local interactions is O(n). To run the simulation in O(n) time, the apply force function should be called only Cn times, for some constant C. We created a data structure to index particles by their location within the domain. In this way, given a particle at location (x, y), we can use the index to find which other particles are close to (x, y) and only try to apply forces from the nearby particles (rather than all particles). We partitioned the domain into a 2-D array of square “bins” with size equal to the local interaction cutoff, so that a particle can only possibly interact with particles in one of the 8 neighboring bins (or in its own bin). We used an array of the STL vector data structures to store pointers to particles in each bin. Since the size s (one side) of the domain is proportional to √ n and the number of bins B is proportional to s, B is proportional to n (the constant turns out to be about 5, depending on the fringe). Only pointers to particles need be stored (rather than copies) because the apply force function reads the position of particles and writes the acceleration components. Thus, as long as no particles are moved before all particles’ accelerations are computed, no data copying is necessary. Thus, for each time step, our serial algorithm does four things: clear the bins, assign particles (i.e. pointers to particles) to bins, compute forces (only those in neighboring bins), and move the particles. Clearing the bins requires B calls to the constant-time vector function empty(), assigning the particles to bins requires n calls to the constant-time vector function push back(), computing forces requires αn calls to apply force (where α is the average number of particles in neighboring bins), and moving the particles requires n calls to the move function. Thus, as long as α is a constant (about 2.2 with the given density and cutoff constants), the algorithm runs in O(n) time. Figure 1 shows the original code runs in O(n) time and our serial (and parallel) code runs in O(n) time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes For the message-passing implementation we use MPI point-to-point and global communication routines. For th...

متن کامل

Parallel Logic Programming Ondistributed Shared Memory

This paper presents an implementation of a parallel logic programming system on a distributed shared memory(DSM) system. Firstly, we give a brief introduction of Andorra-I parallel logic programming system implemented on multi-processors. Secondly, we outline the concurrent programming environment provided by a distributed shared memory system{TreadMarks. Thirdly, we discuss the implementation ...

متن کامل

Parallel Logic Programming on Distributed Shared Memory System

This paper presents an implementation of a parallel logic programming system on a distributed shared memory(DSM) system. Firstly, we give a brie,f introduction of Andorra-I parallel logic programming system implemented on multiprocessors. Secondly, we ou'tline the concurrent programming environment provided by a distributed shared memory system-TreadMarks. Thirdly, we discuss the implementation...

متن کامل

Semiotics of Collective Memory of the Iran-Iraq War (Holy Defence): A Case Study of the Shared Images in Virtual Social Networks

This study aims to achieve a semiotic understanding of collective memory of the Iran-Iraq war. For this purpose, samples of images in virtual social networks shared in response to the news of discovery and return of the bodies of more than 175 divers have been analyzed. Visual signs in photographs, cartoons, graphic designs, prints, paintings and posters, in methods of historical pictures and f...

متن کامل

HPC Selection of Models of DNA Substitution for Multicore Clusters

This paper presents the High Performance Computing (HPC) support of jModelTest2, the most popular bioinformatic tool for the statistical selection of models of DNA substitution. As this task can demand vast computational resources, especially in terms of processing power, jModelTest2 implements three parallel algorithms for model selection: (1) a multithreaded implementation for shared memory a...

متن کامل

Replicated Distributed Shared Memory For The .NET Framework

This paper introduces a software-only object based Distributed Shared Memory (DSM) implementation designed as an extension to the Microsoft .NET framework. This implementation is facilitated by a previously described memory coherence protocol, which uses group communication by multicasting on IP networks. The described DSM implementation allows the construction of distributed applications with ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009