Approximate Sorting of Data Streams with Limited Storage

نویسندگان

  • Farzad Farnoud
  • Eitan Yaakobi
  • Jehoshua Bruck
چکیده

We consider the problem of approximate sorting of a data stream (in one pass) with limited internal storage where the goal is not to rearrange data but to output a permutation that reflects the ordering of the elements of the data stream as closely as possible. Our main objective is to study the relationship between the quality of the sorting and the amount of available storage. To measure quality, we use permutation distortion metrics, namely the Kendall tau and Chebyshev metrics, as well as mutual information, between the output permutation and the true ordering of data elements. We provide bounds on the performance of algorithms with limited storage and present a simple algorithm that asymptotically requires a constant factor as much storage as an optimal algorithm in terms of mutual information and average Kendall tau distortion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A The Value of Multiple Read/Write Streams for Approximating Frequency Moments

We consider the read/write streams model, an extension of the standard data stream model in which an algorithm can create and manipulate multiple read/write streams in addition to its input data stream. Like the data stream model, the most important parameter for this model is the amount of internal memory used by such an algorithm. The other key parameters are the number of streams the algorit...

متن کامل

Efficient Approximation of Correlated Sums on Data Streams

In many applications such as IP network management, data arrives in streams, and queries over those streams need to be processed online using limited storage. Correlated-sum (CS) aggregates are a natural class of queries formed by composing basic aggregates on (x, y) pairs, and are of the form SUM{g(y) : x ≤ f(AGG(x))}, where AGG(x) can be any basic aggregate and f(), g() are user-specified fun...

متن کامل

Solving a New Multi-objective Inventory-Routing Problem by a Non-dominated Sorting Genetic Algorithm

This paper considers a multi-period, multi-product inventory-routing problem in a two-level supply chain consisting of a distributor and a set of customers. This problem is modeled with the aim of minimizing bi-objectives, namely the total system cost (including startup, distribution and maintenance costs) and risk-based transportation. Products are delivered to customers by some heterogeneous ...

متن کامل

Synopsis Construction in Data Streams

Unlike traditional data sets, stream data flow in and out of a computer system continuously and with varying update rates. It may be impossible to store an entire data stream due to its tremendous volume. To discover knowledge or patterns from data streams, it is necessary to develop data stream summarization techniques. Lots of work has been done to summarize the contents of data streams in or...

متن کامل

Probabilistic Counting with Randomized Storage

Previous work by Talbot and Osborne [2007] explored the use of randomized storage mechanisms in language modeling. These structures trade a small amount of error for significant space savings, enabling the use of larger language models on relatively modest hardware. Going beyond space efficient count storage, here we present the Talbot Osborne Morris Bloom (TOMB) Counter, an extended model for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014