SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions
نویسندگان
چکیده
A user-defined function (UDF) is a powerful database feature that allows users to customize database functionality. Though useful, present UDFs have numerous limitations, including install-time specification of input and output schema and poor ability to parallelize execution. We present a new approach to implementing a UDF, which we call SQL/MapReduce (SQL/MR), that overcomes many of these limitations. We leverage ideas from the MapReduce programming paradigm to provide users with a straightforward API through which they can implement a UDF in the language of their choice. Moreover, our approach allows maximum flexibility as the output schema of the UDF is specified by the function itself at query plan-time. This means that a SQL/MR function is polymorphic. It can process arbitrary input because its behavior as well as output schema are dynamically determined by information available at query plan-time, such as the function’s input schema and arbitrary user-provided parameters. This also increases reusability as the same SQL/MR function can be used on inputs with many different schemas or with different userspecified parameters. In this paper we describe the motivation for this new approach to UDFs as well as the implementation within Aster Data Systems’ nCluster database. We demonstrate that in the context of massively parallel, shared-nothing database systems, this model of computation facilitates highly scalable computation within the database. We also include examples of new applications that take advantage of this novel UDF framework.
منابع مشابه
SQLScript: Efficiently Analyzing Big Enterprise Data in SAP HANA
Today, not only Internet companies such as Google, Facebook or Twitter do have Big Data but also Enterprise Information Systems store an ever growing amount of data (called Big Enterprise Data in this paper). In a classical SAP system landscape a central data warehouse (SAP BW) is used to integrate and analyze all enterprise data. In SAP BW most of the business logic required for complex analyt...
متن کاملA Relational Approach to Complex Dataflows
Clouds have become an attractive platform for highly scalable processing of Big Data, especially due to the concept of elasticity, which characterizes them. Several languages and systems for cloud-based data processing have been proposed in the past, with the most popular among them being based on MapReduce [7]. In this paper, we present Exareme, a system for elastic large-scale data processing...
متن کاملSPARQling Pig - Processing Linked Data with Pig Latin
In recent years, dataflow languages such as Pig Latin have emerged as flexible and powerful tools for handling complex analysis tasks on big data. These languages support schema flexibility as well as common programming patterns such as iteration. They offer extensibility through user-defined functions while running on top of scalable distributed platforms. In doing so, these languages enable a...
متن کاملTable-Driven Programming in SQL for Enterprise Information Systems
In database systems, business logic is usually implemented in the forms of external processes, stored procedures, user-defined functions, components, objects, constraints, triggers, etc. In this paper, we advocate the idea of storing business logic – in the form of functions – as data in tables. This idea gives a basis for applying the software-engineering methodology of table-driven programmin...
متن کاملINDREX: In-database relation extraction
The management of text data has a long-standing history in the human mankind. A particular common task is extracting relations from text. Typically, the user performs this task with two separate systems, a relation extraction system and an SQL-based query engine for analytical tasks. During this iterative analytical workflow, the user must frequently ship data between these systems. Worse, the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 2 شماره
صفحات -
تاریخ انتشار 2009