Parallel Nearest Neighbour Algorithms for Text Categorization

نویسندگان

  • Reynaldo Gil-García
  • José M. Badía
  • Aurora Pons-Porrata
چکیده

In this paper we describe the parallelization of two nearest neighbour classification algorithms. Nearest neighbour methods are well-known machine learning techniques. They have been successfully applied to Text Categorization task. Based on standard parallel techniques we propose two versions of each algorithm on message passing architectures. We also include experimental results on a cluster of personal computers using a large text collection. Our algorithms attempt to balance the load among the processors, they are portable, and obtain very good speedups and scalability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Svm Based Improvement in Knn for Text Categorization

ABSTRACTIn today‟s library science, information and computer science, online text classification or text categorization is a huge complication. [1]With the enormous growth of online information and data, text categorization has become one of the crucial techniques for handling and standardizing text data. Various learning algorithms have been applied on text for categorization. On the basis of ...

متن کامل

Neighbor-weighted K-nearest neighbor for unbalanced text corpus

Text categorization or classification is the automated assigning of text documents to pre-defined classes based on their contents. Many of classification algorithms usually assume that the training examples are evenly distributed among different classes. However, unbalanced data sets often appear in many practical applications. In order to deal with uneven text sets, we propose the neighbor-wei...

متن کامل

TEXT CATEGORIZATION Building a kNN classifier for the Reuters-21578 collection

Categorization of texts into topical categories has gained booming interest over the past few years. There is a growing need for tools that help in finding, filtering and managing the highdimensional data due to the rapid growth of online information. Building a text classifier by hand is time consuming and costly and hence automated text categorization has gained a lot of importance. A general...

متن کامل

Arabic text classification using k-nearest neighbour algorithm

Many algorithms have been implemented to the problem of Automatic Text Categorization (ATC). Most of the work in this area has been carried out on English texts, with only a few researchers addressing Arabic texts. We have investigated the use of the K-Nearest Neighbour (K-NN) classifier, with an Inew, cosine, jaccard and dice similarities, in order to enhance Arabic ATC. We represent the datas...

متن کامل

Natural Language Text Classification and Filtering with Trigrams and Evolutionary Nearest Neighbour Classifiers

N grams o er fast language independent multi-class text categorization. Text is reduced in a single pass to ngram vectors. These are assigned to one of several classes by a) nearest neighbour (KNN) and b) genetic algorithm operating on weights in a nearest neighbour classi er. 91% accuracy is found on binary classi cation on short multi-author technical English documents. This falls if more cat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007