Simple Random Sampling from Relational Databases
نویسندگان
چکیده
Sampling is a fundamental operation for the auditing and statistical analysis of large databases. It is not well supported in existing relational database management systems. We discuss how to obtain samples from the results of relational queries without first performing the query. Specifically, we examine simple random sampling from selections, projections, joins, unions, and intersections. We discuss data structures and algorithms for sampling, and their performance. We show that samples of relational queries can often be computed for a small fraction of the effort of computing the entire relational query, i.e., in time proportional to sample size, rather than time proportional to the size of the full result of the relational query.
منابع مشابه
Random Sampling from Databases
Random Sampling from Databases by Frank Olken Doctor of Philosophy in Computer Science University of California at Berkeley Professor Michael Stonebraker, Chair In this thesis I describe e cient methods of answering random sampling queries of relational databases, i.e., retrieving random samples of the results of relational queries. I begin with a discussion of the motivation for including samp...
متن کاملRandom function priors for exchangeable arrays with applications to graphs and relational data
A fundamental problem in the analysis of structured relational data like graphs, networks, databases, and matrices is to extract a summary of the common structure underlying relations between individual entities. Relational data are typically encoded in the form of arrays; invariance to the ordering of rows and columns corresponds to exchangeable arrays. Results in probability theory due to Ald...
متن کاملOptimizing Window Aggregate Functions via Random Sampling
Window functions have been a part of the SQL standard since 2003 and have been well studied during the past decade. As the demand increases in analytics tools, window functions have seen an increasing amount of potential applications. Although the current mainstream commercial databases support window functions, the existing implementation strategies are inefficient for the real-time processing...
متن کاملCoDS: A Representative Sampling Method for Relational Databases
Database sampling has become a popular approach to handle large amounts of data in a wide range of application areas such as data mining or approximate query evaluation. Using database samples is a potential solution when using the entire database is not cost-effective, and a balance between the accuracy of the results and the computational cost of the process applied on the large data set is p...
متن کاملRelational Databases Query Optimization using Hybrid Evolutionary Algorithm
Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1986