Performance Analysis of the MPAS-Ocean Code using HPCToolkit and MIAMI
نویسنده
چکیده
MPAS-Ocean [4] is a component of the MPAS framework of climate models. MPAS-Ocean is an unstructured-mesh ocean model capable of using enhanced horizontal resolution in selected regions of the ocean domain. The code is publicly available for download [3] and comes with several input problems of different sizes corresponding to different simulation resolutions. In this initial study, we look at the per-core performance of version 2.0 of the MPAS-Ocean code. Our analysis was performed on a single node system with dual Intel Xeon E5-2690 CPUs, based on the Sandy Bridge micro-architecture. Each processor has 8 cores and a shared 20 MB L3 cache. We compiled the code with the Intel Fortran compiler 14.0.0 and optimization flags -O3 -g.
منابع مشابه
Application Performance Profiling on the Cray XD1 using HPCToolkit∗
HPCToolkit is an open-source suite of multi-platform tools for profile-based performance analysis of sequential and parallel applications. The toolkit consists of components for collecting performance measurements of fully-optimized executables without adding instrumentation, analyzing application binaries to understand the structure of optimized code, correlating measurements with program stru...
متن کاملHPCTOOLKIT: tools for performance analysis of optimized parallel programs
HPCTOOLKIT is an integrated suite of tools that supports measurement, analysis, attribution, and presentation of application performance for both sequential and parallel programs. HPCTOOLKIT can pinpoint and quantify scalability bottlenecks in fully-optimized parallel programs with a measurement overhead of only a few percent. Recently, new capabilities were added to HPCTOOLKIT for collecting c...
متن کاملExploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers
The incorporation of increasing core counts in modern processors used to build state-of-the-art supercomputers is driving application development towards exploitation of thread parallelism, in addition to distributed memory parallelism, with the goal of delivering efficient high-performance codes. In this work we describe the exploitation of threading and our experiences with it with respect to...
متن کاملA Methodology for Accurate, Effective and Scalable Performance Analysis of Application Programs
We describe a unique and comprehensive methodology for accurately measuring and effectively analyzing the performance of an application’s execution. This methodology is 1) accurate, because it assiduously avoids systematic measurement error (such as that introduced by instrumentation); 2) effective, because it associates useful performance metrics (such as memory bandwidth) with important sourc...
متن کاملThe Design, Implementation, and Performance of a Parallel Ocean Circulation Model
We describe new parallelization techniques applied to a highly parallel ocean circulation application code – the Miami Isopycnic Ocean Coordinate Model or MICOM. We compare three parallel architectures executing MICOM: vector, massively parallel, and a multiprocessor workstation. Results from a high resolution 0.08 MICOM North Atlantic basin calculation on the Cray T3D are described briefly. Th...
متن کامل