SystemML: Declarative Machine Learning on Spark

نویسندگان

  • Matthias Boehm
  • Michael Dusenberry
  • Deron Eriksson
  • Alexandre V. Evfimievski
  • Faraz Makari Manshadi
  • Niketan Pansare
  • Berthold Reinwald
  • Frederick Reiss
  • Prithviraj Sen
  • Arvind Surve
  • Shirish Tatikonda
چکیده

The rising need for custom machine learning (ML) algorithms and the growing data sizes that require the exploitation of distributed, data-parallel frameworks such as MapReduce or Spark, pose significant productivity challenges to data scientists. Apache SystemML addresses these challenges through declarative ML by (1) increasing the productivity of data scientists as they are able to express custom algorithms in a familiar domain-specific language covering linear algebra primitives and statistical functions, and (2) transparently running these ML algorithms on distributed, data-parallel frameworks by applying cost-based compilation techniques to generate efficient, low-level execution plans with in-memory single-node and large-scale distributed operations. This paper describes SystemML on Apache Spark, end to end, including insights into various optimizer and runtime techniques as well as performance characteristics. We also share lessons learned from porting SystemML to Spark and declarative ML in general. Finally, SystemML is open-source, which allows the database community to leverage it as a testbed for further research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs

Declarative large-scale machine learning (ML) aims at the specification of ML algorithms in a high-level language and automatic generation of hybrid runtime execution plans ranging from single node, in-memory computations to distributed computations on MapReduce (MR) or similar frameworks like Spark. The compilation of large-scale ML programs exhibits many opportunities for automatic optimizati...

متن کامل

Deep Learning with Apache SystemML

Deep Learning (DL) is a subfield of Machine Learning (ML) that focuses on learning hierarchical representations of data with multiple levels of abstraction using neural networks [15]. Recent advances in deep learning are made possible due to the availability of large amounts of labeled data, use of GPGPU compute, and application of new techniques (such as ReLU, batch normalization [12], dropout...

متن کامل

Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML

SystemML aims at declarative, large-scale machine learning (ML) on top of MapReduce, where high-level ML scripts with R-like syntax are compiled to programs of MR jobs. The declarative specification of ML algorithms enables—in contrast to existing large-scale machine learning libraries— automatic optimization. SystemML’s primary focus is on data parallelism but many ML algorithms inherently exh...

متن کامل

SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs

SystemML enables declarative, large-scale machine learning (ML) via a high-level language with R-like syntax. Data scientists use this language to express their ML algorithms with full flexibility but without the need to hand-tune distributed runtime execution plans and system configurations. These ML programs are dynamically compiled and optimized based on data and cluster characteristics usin...

متن کامل

On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML

Many large-scale machine learning (ML) systems allow specifying custom ML algorithms by means of linear algebra programs, and then automatically generate efficient execution plans. In this context, optimization opportunities for fused operators—in terms of fused chains of basic operators—are ubiquitous. These opportunities include (1) fewer materialized intermediates, (2) fewer scans of input d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2016