Today’s hardware platforms have parallel processing capabilities and many programming models been developed. It is necessary to research an efficient implementation of compute-intensive applications using available platforms. Dense matrix-matrix multiplication important kernel that used in applications, while it computationally intensive, especially for large matrix sizes. To improve the perfor...