Abstract We propose an efficient distributed out-of-memory implementation of the non-negative matrix factorization (NMF) algorithm for heterogeneous high-performance-computing systems. The proposed is based on prior work NMFk, which can perform automatic model selection and extract latent variables patterns from data. In this work, we extend NMFk by adding support dense sparse operation multi-n...