Propeller: A Scalable Metadata Organization for A Versatile Searchable File System

نویسندگان

  • Lei Xu
  • Hong Jiang
  • Xue Liu
  • Lei Tian
  • Yu Hua
  • Jian Hu
چکیده

The exponentially increasing amount of data in file systems has made it increasingly important for file systems to provide fast file-search services. The quality of the file-search services is significantly affected by the file-index overhead, the file-search responsiveness and the accuracy of search results. Unfortunately, the existing file-search solutions either are so poorly scalable that their performance degrades unacceptably when the systems scale up, or incur so much crawling delays that they produce acceptably inaccurate results. We believe that the time is ripe for the re-designing of a searchable file system capable of accurate and scalable system-level file search. The main challenge facing the design and implementation of such a searchable file system is how to update file indices in a real-time and scalable way to obtain accurate file-search results. Thus we propose a lightweight and scalable metadata organization, Propeller, for the envisioned searchable file system. Propeller partitions the namespace according to file-access patterns, which exposes massive parallelism for the emerging manycore architecture to support future searchable file systems. The extensive evaluation results of our Propeller prototype show that it achieves significantly better file-indexing and file-search performance (up to 250×) than a centralized solution (MySQL) and only incurs negligible overhead (< 16%) to the normal file I/O operations and faster direct access performance (16.6×) on a state-ofthe-art file system (Ext4). Furthermore, the 100% recall accuracy ensures Propeller offering a feasible metadata scheme for the system-level file-search services.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

VSFS: A Versatile Searchable File System for HPC Analytics

Emerging HPC analytics applications urgently demand filesearch services to drastically reduce the scale of the input data in real-time, so that the speed of computation and data analytics can be greatly accelerated. Unfortunately, the existing file-search solutions are either poorly scalable for large-scale systems, or lack a well-integrated interface to allow applications to easily use them fo...

متن کامل

CalvinFS: Consistent WAN Replication and Scalable Metadata Management for Distributed File Systems

Existing file systems, even the most scalable systems that store hundreds of petabytes (or more) of data across thousands of machines, store file metadata on a single server or via a shared-disk architecture in order to ensure consistency and validity of the metadata. This paper describes a completely different approach for the design of replicated, scalable file systems, which leverages a high...

متن کامل

Magellan: A Searchable Metadata Architecture for Large-Scale File Systems

As file systems continue to grow, metadata search is becoming an increasingly important way to access and manage files. However, existing solutions that build a separate metadata database outside of the file system face consistency and management challenges at large-scales. To address these issues, we developed Magellan, a new large-scale file system metadata architecture that enables the file ...

متن کامل

Scalable Performance of the Panasas Parallel File System

The Panasas file system uses parallel and redundant access to object storage devices (OSDs), per-file RAID, distributed metadata management, consistent client caching, file locking services, and internal cluster management to provide a scalable, fault tolerant, high performance distributed file system. The clustered design of the storage system and the use of clientdriven RAID provide scalable ...

متن کامل

Copernicus: A Scalable, High-Performance Semantic File System

Hierarchical file systems do not effectively meet the needs of users at the petabyte-scale. Users need dynamic, search-based file access in order to properly manage and use their growing sea of data. This paper presents the design of Copernicus, a new scalable, semantic file system that provides a searchable namespace for billions of files. Instead of augmenting a traditional file system with a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013