Distributed Storage of Large-Scale Multidimensional Electroencephalogram Data Using Hadoop and HBase

نویسندگان

  • Haimonti Dutta
  • Alex Kamil
  • Manoj Pooleery
  • Simha Sethumadhavan
  • John Demme
چکیده

Huge volumes of data are being accumulated from a variety of sources in engineering and scientific disciplines; this has been referred to as the “Data Avalanche”. Cloud computing infrastructures (such as Amazon Elastic Compute Cloud (EC2)) are specifically designed to combine high compute performance with high performance network capability to meet the needs of data-intensive science. Reliable, scalable, distributed computing is used extensively on the cloud. Apache Hadoop is one such open-source project that provides a distributed file system to create multiple replicas of data blocks and distribute them on compute nodes throughout a cluster to enable reliable and rapid computations. Column-oriented databases built on Hadoop (such as HBase) along with MapReduce programming paradigm allows development of large scale distributed computing applications with ease. In this chapter, benchmarking results on a small in-house Hadoop cluster composed of 29 nodes each with 8-core processors is presented along with a case-study on distributed storage of electroencephalogram (EEG) data. Our results indicate that Haimonti Dutta Center for Computational Learning Systems (CCLS), Columbia University, New York 10115.email: [email protected] Alex Kamil School of General Studies, Columbia University, New York, NY 10027.e-mail: [email protected] Manoj Pooleery Center for Computational Learning Systems (CCLS), Columbia University, New York 10115. e-mail: [email protected] Simha Sethumadhavan Computer Architecture Laboratory, Department of Computer Science, Columbia University, New York, 10115. e-mail: [email protected] John Demme Department of Computer Science, Columbia University, New York, 10115. e-mail: [email protected] ∗ Corresponding Author

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HBase and Hypertable for large scale distributed storage systems A Performance evaluation for Open Source BigTable Implementations

BigTable is a distributed storage system developed at Google for managing structured data and has the capability to scale to a very large size: petabytes of data across thousands of commodity servers. As now, there exist two open-source implementations that closely emulate most of the components of Google’s BigTable i.e. HBase and Hypertable. HBase is written in Java and provides BigTable like ...

متن کامل

Improving Efficiency and Time Complexity of Big Data Mining using Apache Hadoop with HBase storage model

Data Mining is the science of mining the knowledge from the raw data and applying to improvement of the industrial rules. Now for the mining of “ big data “ we required new approach new algorithm and new techniques and analytics to mining the knowledge from it. Day by day a huge amount of data is generated and the usage is expanding .The term BIGDATA is a popular term which used to describe the...

متن کامل

MHBase: A Distributed Real-Time Query Scheme for Meteorological Data Based on HBase

Meteorological technology has evolved rapidly in recent years to provide enormous, accurate and personalized advantages in the public service. Large volumes of observational data are generated gradually by technologies such as geographical remote sensing, meteorological radar satellite, etc. that makes data analysis in weather forecasting more precise but also poses a threat to the traditional ...

متن کامل

Scalable Inverted Indexing on NoSQL Table Storage

The development of data intensive problems in recent years has brought new requirements and challenges to storage and computing infrastructures. Researchers are not only doing batch loading and processing of large scale of data, but also demanding the capabilities of incremental updates and interactive analysis. Therefore, extending existing storage systems to handle these new requirements beco...

متن کامل

Distributed RDF Triple Store Using HBase and Hive

The growth of web data has presented new challenges regarding the ability to effectively query RDF data. Traditional relational database systems efficiently scale and query distributed data. With the development of Hadoop its implementation of the MapReduce Framework along with HBase, a NoSQL data store, the semantics of processing and querying data has changed. Given the existing structure of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011