Text Categorization via Ellipsoid Separation
نویسندگان
چکیده
We present a new batch learning algorithm for text classification in the vector space of document representations. The algorithm uses ellipsoid separation [3] in the feature space which leads to a semidefinite program. An approximation of the latent semantic feature extraction approach using Gram-Schmidt orthogonalization [2] is used for the feature extraction. Preliminary results demonstrate some potential for the presented approach.
منابع مشابه
Separating stable sets in claw-free graphs via Padberg-Rao and compact linear programs
In this paper, we provide the first linear programming formulations for the stable set problem in claw-free graphs, together with polynomial time separation routines for those formulations (they are not compact). We then exploit one of those extended formulations and propose a new polytime algorithm for solving the separation problem for the stable set polytope of claw-free graphs. This routine...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملText Categorization from Category Name via Lexical Reference
Requiring only category names as user input is a highly attractive, yet hardly explored, setting for text categorization. Earlier bootstrapping results relied on similarity in LSA space, which captures rather coarse contextual similarity. We suggest improving this scheme by identifying concrete references to the category name’s meaning, obtaining a special variant of lexical expansion.
متن کاملChinese Text Categorization via Bottom-Up Weighted Word Clustering
Most of the researches on text categorization are focus on using bag of words. Some researches provided other methods for classification such as term phrase, Latent Semantic Indexing, and term clustering. Term clustering is an effective way for classification, and had been proved as a good method for decreasing the dimensions in term vectors. The authors used hierarchical term clustering and ag...
متن کاملVariable Selection as an Instance-Based Ontology Mapping Strategy
The paper presents a novel instance-based approach to aligning concepts taken from two heterogeneous ontologies populated with text documents. We introduce a concept similarity measure based on the size of the intersection of the sets of variables which are most important for the class separation of the instances in both input ontologies. We suggest a VC dimension variable selection criterion e...
متن کامل