Distributed Parameter Map-Reduce

نویسنده

  • Qi Li
چکیده

This paper describes how to convert a machine learning problem into a series of map-reduce tasks. We study logistic regression algorithm. In logistic regression algorithm, it is assumed that samples are independent and each sample is assigned a probability. Parameters are obtained by maxmizing the product of all sample probabilities. Rapid expansion of training samples brings challenges to machine learning method. Training samples are so many that they can be only stored in distributed file system and driven by map-reduce style programs. The main step of logistic regression is inference. According to map-reduce spirit, each sample makes inference through a separate map procedure. But the premise of inference is that the map procedure holds parameters for all features in the sample. In this paper, we propose Distributed Parameter Map-Reduce, in which not only samples, but also parameters are distributed in nodes of distributed filesystem. Through a series of map-reduce tasks, we assign each sample parameters for its features, make inference for the sample and update paramters of the model. The above processes are excuted looply until convergence. We test the proposed algorithm in actual hadoop production environment. Experiments show that the acceleration of the algorithm is in linear relationship with the number of cluster nodes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Market Basket Analysis Algorithm on Map/Reduce in AWS EC2

As the web, social networking, and smartphone application have been popular, the data has grown drastically everyday. Thus, such data is called Big Data. Google met Big Data earlier than others and recognized the importance of the storage and computation of Big Data. Thus, Google implemented its parallel computing platform with Map/Reduce approach on Google Distributed File Systems (GFS) in ord...

متن کامل

Analysing Distributed Big Data through Hadoop Map Reduce

This term paper focuses on how the big data is analysed in a distributed environment through Hadoop Map Reduce. Big Data is same as “small data” but bigger in size. Thus, it is approached in different ways. Storage of Big Data requires analysing the characteristics of data. It can be processed by the employment of Hadoop Map Reduce. Map Reduce is a programming model working parallel for large c...

متن کامل

An Optimization of Gu Map-1

As a modified version of GGH map, Gu map-1 was successful in constructing multi-party key exchange (MPKE). In this short paper we present a result about the parameter setting of Gu map-1, therefore we can reduce a key parameter τ from original O(n) down to O(λn) (in theoretically secure case, where λ is the security parameter), and even down to O(2n) (in computationally secure case). Such optim...

متن کامل

a Simplified Model of Distributed Parameter Systems

A generalized simplified model for describing the dynamic behavior of distributed parameter systems is proposed. The various specific characteristics of gain and phase angle of distributed parameter systems are investigated from frequency response formulation and complex plane representation of the proposed simplified model. The complex plane investigation renders some important inequality cons...

متن کامل

Survey on Load Balancing and Data Skew Mitigation in Mapreduce Applications

Since few years Map Reduce programming model have shown great success in processing huge amount of data. Map Reduce is a framework for data-intensive distributed computing of batch jobs. This data-intensive processing creates skew in Map Reduce framework and degrades performance by great value. This leads to greatly varying execution time for the Map Reduce jobs. Due to this varying execution t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1510.00817  شماره 

صفحات  -

تاریخ انتشار 2015