Projection Based Large Scale High-Dimensional Data Similarity Join Using MapReduce Framework
نویسندگان
چکیده
منابع مشابه
MR-DSJ: Distance-Based Self-Join for Large-Scale Vector Data Analysis with MapReduce
Data analytics gets faced with huge and tremendously increasing amounts of data for which MapReduce provides a very convenient and effective distributed programming model. Various algorithms already support massive data analysis on computer clusters but, in particular, distance-based similarity self-joins lack efficient solutions for large vector data sets though they are fundamental in many da...
متن کاملFast similarity join for multi-dimensional data
To appear in Information Systems Journal, Elsevier, 2005 The efficient processing of multidimensional similarity joins is important for a large class of applications. The dimensionality of the data for these applications ranges from low to high. Most existing methods have focused on the execution of high-dimensional joins over large amounts of disk-based data. The increasing sizes of main memor...
متن کاملMapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives ...
متن کاملComparing MapReduce-Based k-NN Similarity Joins on Hadoop for High-Dimensional Data
Similarity joins represent a useful operator for data mining, data analysis and data exploration applications. With the exponential growth of data to be analyzed, distributed approaches like MapReduce are required. So far, the state-of-the-art similarity join approaches based on MapReduce mainly focused on the processing of low-dimensional vector data. In this paper, we revisit and investigate ...
متن کاملDetecting Communities over Large Scale Graph Structure Data using MapReduce
With the appearances of the internet there is growing interest in executing analysis tasks over large scale graph structure data. This task includes processing of subgraph or multi-hop neighborhoods in graph. Examples of these graphs include identifying social circles, modified recommendation, Anomaly finding, link prediction and so on. These works are not well served by the vertex centric appr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2020
ISSN: 2169-3536
DOI: 10.1109/access.2020.3007028