Text Categorization via Ellipsoid Separation

نویسندگان

  • Andriy Kharechko
  • John Shawe-Taylor
  • Ralf Herbrich
  • Thore Graepel
چکیده

We present a new batch learning algorithm for text classification in the vector space of document representations. The algorithm uses ellipsoid separation [3] in the feature space which leads to a semidefinite program. An approximation of the latent semantic feature extraction approach using Gram-Schmidt orthogonalization [2] is used for the feature extraction. Preliminary results demonstrate some potential for the presented approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Separating stable sets in claw-free graphs via Padberg-Rao and compact linear programs

In this paper, we provide the first linear programming formulations for the stable set problem in claw-free graphs, together with polynomial time separation routines for those formulations (they are not compact). We then exploit one of those extended formulations and propose a new polytime algorithm for solving the separation problem for the stable set polytope of claw-free graphs. This routine...

متن کامل

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

Text Categorization from Category Name via Lexical Reference

Requiring only category names as user input is a highly attractive, yet hardly explored, setting for text categorization. Earlier bootstrapping results relied on similarity in LSA space, which captures rather coarse contextual similarity. We suggest improving this scheme by identifying concrete references to the category name’s meaning, obtaining a special variant of lexical expansion.

متن کامل

Chinese Text Categorization via Bottom-Up Weighted Word Clustering

Most of the researches on text categorization are focus on using bag of words. Some researches provided other methods for classification such as term phrase, Latent Semantic Indexing, and term clustering. Term clustering is an effective way for classification, and had been proved as a good method for decreasing the dimensions in term vectors. The authors used hierarchical term clustering and ag...

متن کامل

Variable Selection as an Instance-Based Ontology Mapping Strategy

The paper presents a novel instance-based approach to aligning concepts taken from two heterogeneous ontologies populated with text documents. We introduce a concept similarity measure based on the size of the intersection of the sets of variables which are most important for the class separation of the instances in both input ontologies. We suggest a VC dimension variable selection criterion e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004