Any modern Graphics Processing Unit (graphics card) is a good platform to run massively parallel programs. Still, we lack tools observe and measure performance characteristics of GPU-based software. We state that due complex memory hierarchy thou- sands execution threads the all issues are about efficient use graphics card hierarchy. propose GPGPUSim simulator, previously used mostly for archit...