With the deep development and application of Internet technology, data need to be processed more and more, when dealing with large amounts of data. Spark is a versatile high-performance and parallel computing framework, which can be applied to data mining. This paper is based on the parallelization of platforms’ K-means algorithm, by building a YARN cluster environment and making experiments to...