Splash: User-friendly Programming Interface for Parallelizing Stochastic Algorithms
نویسندگان
چکیده
Stochastic algorithms are efficient approaches to solving machine learning and optimization problems. In this paper, we propose a general framework called Splash for parallelizing stochastic algorithms on multi-node distributed systems. Splash consists of a programming interface and an execution engine. Using the programming interface, the user develops sequential stochastic algorithms without concerning any detail about distributed computing. The algorithm is then automatically parallelized by a communication-efficient execution engine. We provide theoretical justifications on the optimal rate of convergence for parallelizing stochastic gradient descent. The real-data experiments with stochastic gradient descent, collapsed Gibbs sampling, stochastic variational inference and stochastic collaborative filtering verify that Splash yields order-of-magnitude speedup over single-thread stochastic algorithms and over parallelized batch algorithms. Besides its efficiency, Splash provides a rich collection of interfaces for algorithm implementation. It is built on Apache Spark and is closely integrated with the Spark ecosystem.
منابع مشابه
Exdasy - A User-Friendly and Extendable Data Distribution System
This paper introduces Exdasy, a user-friendly and extendable software tool for partitioning unstructured meshes and mapping mesh partitions to parallel computers. Exdasy was designed to meet the increasing demands to today’s data distribution systems, which are posed by the variety of mesh computations, the ongoing development of distribution algorithms and rapid changes in parallel hardware te...
متن کاملAn object-oriented software implementation of a novel cuckoo search algorithm
This paper presents an object-oriented software system that implements a cuckoo search (CS) metaheuristic for unconstrained optimization problems. Yang and Deb developed cuckoo search algorithm in MATLAB and tested it on some standard benchmark functions as well as on some engineering optimization problems where it showed promising results. We developed our algorithm in JAVA programming languag...
متن کاملParallelizing Big Data Machine Learning Algorithms with Model Rotation
This paper investigates a novel approach to parallelization of machine learning algorithms using model rotation as an effective parallel computation model. We identify the importance of model rotation owing to its ability to shift the latest model updates to a neighboring computation, thereby guaranteeing model consistency which is hard to achieve in other computation models. We distinguish com...
متن کاملCambridge Rocketry Simulator – A Stochastic Six-Degrees-of-Freedom Rocket Flight Simulator
The Cambridge Rocketry Simulator can be used to simulate the flight of unguided rockets for both design and operational applications. The software consists of three parts: The first part is a GUI that enables the user to design a rocket. The second part is a verified and peer-reviewed physics model that simulates the rocket flight. This includes a Monte Carlo wrapper to model the uncertainty in...
متن کاملA stochastic model for project selection and scheduling problem
Resource limitation in zero time may cause to some profitable projects not to be selected in project selection problem, thus simultaneous project portfolio selection and scheduling problem has received significant attention. In this study, budget, investment costs and earnings are considered to be stochastic. The objectives are maximizing net present values of selected projects and minimizing v...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1506.07552 شماره
صفحات -
تاریخ انتشار 2015