Splash: Integrated Ad-Hoc Querying of Data and Statistical Models

نویسندگان

  • Lujun Fang
  • Kristen LeFevre
چکیده

This paper presents a system called Splash, which integrates statistical modeling and SQL for the purpose of adhoc querying and analysis. Splash supports a novel, simple, and practical abstraction of statistical modeling as an aggregate function, which in turn provides for natural integration with standard SQL queries and a relational DBMS. In addition, we introduce and implement a novel representatives operator to help explain statistical models using a limited number of representative examples. We present a proof-of-concept implementation of the system, which includes several performance optimizations. An experimental study indicates that our system scales well to large input datasets. Further, to demonstrate the simplicity and usability of the new abstractions, we conducted a case study using Splash to perform a series of exploratory analyses using network log data. Our study indicates that the query-based interface is simpler than a common data mining software package, and for ad-hoc analysis, it often requires less programming effort to use.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information

With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...

متن کامل

PADX: Querying Large-scale Ad Hoc Data with XQuery

This paper describes our experience designing and implementing PADX, a system for querying large-scale ad hoc data sources with XQuery. PADX is the synthesis and extension of two existing systems: PADS and Galax. With PADX, an analyst writes a declarative data description of the physical layout of her ad hoc data, and the PADS compiler produces customizable libraries for parsing the data and fo...

متن کامل

Modeling of VANET Technology & Ad-Hoc Routing Protocols Based on High Performance Random Waypoint Models

Today, one of the new technologies in the modern era is Vehicular Ad-hoc Network which has takenenormous attention in the recent years. Because of rapid topology changing and frequent disconnectionmakes it difficult to design an efficient routing protocol for routing data between vehicles, called V2V orvehicle to vehicle communication and vehicle to roadside infrastructure, called V2I. Designin...

متن کامل

Moving Objects Database Technology for Ad-Hoc Querying and Satellite Data Retrieval of Dynamic Atmospheric Events

Existing state-of-the-art and web-based weather event information portals, data archives, and forecast services provide excellent subsetting and visualizations of weather events and satellite sensor measurements. However, users only obtain limited, simple, and hard-coded query, retrieval, and analysis capabilities from these sources. One non-existent but desirable capability is the accurate and...

متن کامل

Self-service Ad-hoc Querying Using Controlled Natural Language

The ad-hoc querying process is slow and error prone due to inability of business experts of accessing data directly without involving IT experts. The problem lies in complexity of means used to query data. We propose a new natural languageand semistar ontology-based ad-hoc querying approach which lowers the steep learning curve required to be able to query data. The proposed approach would sign...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009