Finding frequent items in parallel
نویسندگان
چکیده
We present a deterministic parallel algorithm for the k–majority problem, that can be used to find in parallel frequent items, i.e., those whose multiplicity is greater than a given threshold, and is therefore useful in the context of iceberg queries and many other different contexts. The algorithm can be used both in the on–line (stream) context and in the off–line setting, the difference being that in the former case we are restricted to a single scan of the input elements, so that verifying the frequent items that have been determined is not allowed (e.g., network traffic streams passing through internet routers), while in the latter a parallel scan of the input can be used to determine the actual k– majority elements. To the best of our knowledge, this is the first parallel algorithm solving the proposed problem.
منابع مشابه
A Generalized Parallel Algorithm for Frequent Itemset Mining
A parallel algorithm for finding the frequent itemsets in a set of transactions is presented. The frequent individual items are identified by their index. We assume that processors number (m) is less than the frequent items number (n). At the first stage, every processor Pi, i ∈ {1, . . . ,m − 1} sequentially computes the frequent itemsets from the interval Ii = [(i − 1) · p + 1, i · p], where ...
متن کاملMining Frequent Parallel Episodes with Selective Participation
We consider the task of finding frequent parallel episodes in parallel point processes, allowing for imprecise synchrony of the events constituting occurrences (temporal imprecision) as well as incomplete occurrences (selective participation). We tackle this problem with frequent pattern mining based on the CoCoNAD methodology, which is designed to take care of temporal imprecision. To cope wit...
متن کاملFinding Frequent Patterns in Parallel Point Processes
We consider the task of finding frequent patterns in parallel point processes—also known as finding frequent parallel episodes in event sequences. This task can be seen as a generalization of frequent item set mining: the co-occurrence of items (or events) in transactions is replaced by their (imprecise) co-occurrence on a continuous (time) scale, meaning that they occur in a limited (time) spa...
متن کاملFinding Frequent Items over General Update Streams
We present novel space and time-efficient algorithms for finding frequent items over general update streams. Our algorithms are based on a novel adaptation of the popular dyadic intervals method for finding frequent items. The algorithms improve upon existing algorithms in both theory and practice.
متن کاملA parallel space saving algorithm for frequent items and the Hurwitz zeta distribution
We present a message-passing based parallel version of the Space Saving algorithm designed to solve the k–majority problem. The algorithm determines in parallel frequent items, i.e., those whose frequency is greater than a given threshold, and is therefore useful for iceberg queries and many other different contexts. We apply our algorithm to the detection of frequent items in both real and syn...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Concurrency and Computation: Practice and Experience
دوره 23 شماره
صفحات -
تاریخ انتشار 2011