Understanding the effects of process placement on application performance on an AMD Interlagos processor

نویسندگان

  • Kalyana Chadalavada
  • Manisha Gajbe
چکیده

In this paper, we explore the impact of process placement on application performance when using an AMD Opteron bulldozer architecture CPU on a Cray XE6 node. We conduct a low-level analysis of possible resource contention on the Interlagos core modules using application kernels to exemplify target workloads. We will also characterize the performance of OpenMP threads in dual stream or packed mode and single stream or unpacked configuration. Using CrayPat tools and PAPI counters, we attempt to quantify bottlenecks to efficient utilization of the processors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Multi Objective HFAPSO algorithm for Simultaneous Placement of DG, Capacitor and Protective Device in Radial Distribution Network

In this paper, simultaneous placement of distributed generation, capacitor bank and protective devices are utilized to improve the efficiency of the distribution network. The objectives of the problem are reduction of active and reactive power losses, improvement of voltage profile and reliability indices and increasing distribution companies’ profit. The combination of firefly algorithm, parti...

متن کامل

Characterizing Compiler Performance for the AMD Opteron Processor on a Parallel Platform

Application performance on a high performance, parallel platform depends on a variety of factors, the most important being the performance of the high speed interconnect and the compute node processor. The performance of the compute processor depends on how well the compiler optimizes for a given processor architecture, and how well it optimizes the applications source code. An analysis of uni-...

متن کامل

Benchmarking CMSSW on Intel and AMD single-core, dual- core and quad-core systems

We have benchmarked dual-processor quad-core AMD Opteron 2350 and 2356, dual-processor quad-core Intel Xeon E5345, single processor quad-core Intel Xeon X5472, dual-processor dual-core AMD Opteron 2214, dual-processor single-core Intel Xeon EM64T and single-processor single-core Intel Xeon EM64T systems using a CMSSW event simulation and reconstruction application. The results are presented in ...

متن کامل

Performance Consistency on Multi-socket AMD Opteron Systems Performance Consistency on Multi-socket AMD Opteron Systems

Compute nodes with multiple sockets each of which has multiple cores are starting to dominate in the area of scientific computing clusters. Performance inconsistencies from one execution to the next makes any performance debugging or tuning difficult. The resulting performance inconsistencies are bigger for memory-bound applications but still noticeable for all but the most compute-intensive ap...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012