Parallel Nearest Neighbour Algorithms for Text Categorization
نویسندگان
چکیده
In this paper we describe the parallelization of two nearest neighbour classification algorithms. Nearest neighbour methods are well-known machine learning techniques. They have been successfully applied to Text Categorization task. Based on standard parallel techniques we propose two versions of each algorithm on message passing architectures. We also include experimental results on a cluster of personal computers using a large text collection. Our algorithms attempt to balance the load among the processors, they are portable, and obtain very good speedups and scalability.
منابع مشابه
Svm Based Improvement in Knn for Text Categorization
ABSTRACTIn today‟s library science, information and computer science, online text classification or text categorization is a huge complication. [1]With the enormous growth of online information and data, text categorization has become one of the crucial techniques for handling and standardizing text data. Various learning algorithms have been applied on text for categorization. On the basis of ...
متن کاملNeighbor-weighted K-nearest neighbor for unbalanced text corpus
Text categorization or classification is the automated assigning of text documents to pre-defined classes based on their contents. Many of classification algorithms usually assume that the training examples are evenly distributed among different classes. However, unbalanced data sets often appear in many practical applications. In order to deal with uneven text sets, we propose the neighbor-wei...
متن کاملTEXT CATEGORIZATION Building a kNN classifier for the Reuters-21578 collection
Categorization of texts into topical categories has gained booming interest over the past few years. There is a growing need for tools that help in finding, filtering and managing the highdimensional data due to the rapid growth of online information. Building a text classifier by hand is time consuming and costly and hence automated text categorization has gained a lot of importance. A general...
متن کاملArabic text classification using k-nearest neighbour algorithm
Many algorithms have been implemented to the problem of Automatic Text Categorization (ATC). Most of the work in this area has been carried out on English texts, with only a few researchers addressing Arabic texts. We have investigated the use of the K-Nearest Neighbour (K-NN) classifier, with an Inew, cosine, jaccard and dice similarities, in order to enhance Arabic ATC. We represent the datas...
متن کاملNatural Language Text Classification and Filtering with Trigrams and Evolutionary Nearest Neighbour Classifiers
N grams o er fast language independent multi-class text categorization. Text is reduced in a single pass to ngram vectors. These are assigned to one of several classes by a) nearest neighbour (KNN) and b) genetic algorithm operating on weights in a nearest neighbour classi er. 91% accuracy is found on binary classi cation on short multi-author technical English documents. This falls if more cat...
متن کامل