Optimizing Data Locality between the Swift Parallel Programming System and the FusionFS Distributed File System
نویسندگان
چکیده
Many of the high-performance computing (HPC) systems use a centralized storage system that is separate from the compute system. This approach is not going to be scalable as we seek to achieve exa-scale performance[6]. Distributed file systems can provide the scalability needed for exa-scale computing. FusionFS is a file system designed for HPC systems that achieves scalability in part by removing bottlenecks found in metadata management. Swift/T is a high level, implicitly parallel scripting language for HPC systems. Swift/T provides automated parallelism and load balancing on a massive scale. Additional optimizations can be achieved by utilizing the features FusionFS and Swift/T to take advantage of locality. In this paper, we will look at using Swift/T’s language features to optimize locality in FusionFS.
منابع مشابه
FusionFS: a distributed file system for large scale data-intensive computing
Today’s science is generating datasets that are increasing exponentially in both complexity and volume, making their analysis, archival, and sharing one of the grand challenges of the 21st century. Exascale computing, i.e. 10 FLOPS, is predicted to emerge by 2019 with current trends. Millions of nodes and billions of threads of execution, producing similarly large concurrent data accesses, are ...
متن کاملJFusionFS A Java Implementation of FusionFS
FusionFS is a node local distributed storage system that was developed for High Performance Computing systems. FusionFS disperses metadata to all available compute nodes through the use of a distributed hash table (DHT), and thus overcomes the metadata problem common in many storage systems. FusionFS also relies on a parallel file system (PFS) which acts as a large file store (LFS) when the fil...
متن کاملFusionProv: Towards a Provenance-Aware Distributed Filesystem
It has become increasingly important to capture and understand the origins and derivation of data (its provenance). A key issue in evaluating the feasibility of data provenance is its performance, overheads, and scalability. In this paper, we explore the feasibility of a management layer for parallel file systems, in which metadata includes both file operations and provenance metadata. We desig...
متن کاملFusionFS: Enabling Distributed Indexing And Text Search
This project will focus on extending the functionality of FusionFS[1] to enable file-system wide text indexing and searching capabilities. It will build on existing indexing libraries, and utilizes the distributed architecture in order to enable fast distributed text searching across a distributed file-system
متن کاملSwift: A language for distributed parallel scripting
Scientists, engineers, and statisticians must execute domain-specific application programs many times on large collections of file-based data. This activity requires complex orchestration and data management as data is passed to, from, and among application invocations. Distributed and parallel computing resources can accelerate such processing, but their use further increases programming compl...
متن کامل