Verifying Equivalence of Spark Programs

نویسندگان

  • Shelly Grossman
  • Sara Cohen
  • Shachar Itzhaky
  • Noam Rinetzky
  • Shmuel Sagiv
چکیده

Apache Spark is a popular framework for writing large scale data processing applications. Our long term goal is to develop automatic tools for reasoning about Spark programs. This is challenging because Spark programs combine database-like relational algebraic operations and aggregate operations, corresponding to (nested) loops, with User Defined Functions (UDFs). In this paper, we present a novel SMT-based technique for verifying the equivalence of Spark programs. We model Spark as a programming language whose semantics imitates Relational Algebra queries (with aggregations) over bags (multisets) and allows for UDFs expressible in Presburger Arithmetics. We prove that the problem of checking equivalence is undecidable even for programs which use a single aggregation operator. Thus, we present sound techniques for verifying the equivalence of interesting classes of Spark programs, and show that it is complete under certain restrictions. We implemented our technique, and applied it to a few small, but intricate, test cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Verifying Equivalence of Spark Programs Technical Report 1-Nov-2016

Spark is a popular framework for writing large scale data processing applications. Our goal is to develop tools for reasoning about Spark programs. This is challenging because Spark programs combine database-like relational algebraic operations and aggregate operations with User Defined Functions (UDF s). We present the first technique for verifying the equivalence of Spark programs. We model S...

متن کامل

Experimental Study of Performance of Spark Ignition Engine with Gasoline and Natural Gas

The tests were carried out with the spark timing adjusted to the maximum brake torquetiming in various equivalence ratios and engine speeds for gasoline and natural gas operations. In thiswork, the lower heating value of gasoline is about 13.6% higher than that of natural gas. Based on theexperimental results, the natural gas operation causes an increase of about 6.2% brake special fuelconsumpt...

متن کامل

Verifying the Equivalence of Logic Programs in the Disjunctive Case

• We consider (weak) equivalence of disjunctive logic programs. • We have previously developed an automated translation-based method for verifying the equivalence of programs supported by the smodels system. • P ≡s Q =⇒ P ≡ Q (by setting R = ∅), but P ≡ Q 6=⇒ P ≡s Q. • Whether P ≡ Q holds, remains open whenever P 6≡s Q holds =⇒ Verifying P ≡ Q remains as a problem of its own. • Complexity resul...

متن کامل

Proving Equivalence Between Imperative and MapReduce Implementations Using Program Transformations

Distributed programs are often formulated in popular functional frameworks like MapReduce, Spark and Thrill, but writing efficient algorithms for such frameworks is usually a non-trivial task. As the costs of running faulty algorithms at scale can be severe, it is highly desirable to verify their correctness. We propose to employ existing imperative reference implementations as specifications f...

متن کامل

Inlining External Sources in Answer Set Programs

HEX-programs are an extension of answer set programming (ASP) towards external sources. To this end, external atoms provide a bidirectional interface between the program and an external source. The traditional evaluation algorithm for HEX-programs is based on guessing truth values of external atoms and verifying them by explicit calls of the external source. The approach was optimized by techni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017