Support Tensor Machines for Text Categorization∗
نویسندگان
چکیده
We consider the problem of text representation and categorization. Conventionally, a text document is represented by a vector in high dimensional space. Some learning algorithms are then applied in such a vector space for text categorization. Particularly, Support Vector Machine (SVM) has received a lot of attentions due to its effectiveness. In this paper, we propose a new classification algorithm called Support Tensor Machine (STM). STM uses Tensor Space Model to represent documents. It considers a document as the second order tensor in Rn1 ⊗Rn2 , where Rn1 and Rn2 are two vector spaces. With tensor representation, the number of parameters estimated by STM is much less than the number of parameters estimated by SVM. Therefore, our algorithm is especially suitable for small sample cases. We compared our proposed algorithm with SVM for text categorization on two standard databases. Experimental results show the effectiveness of our algorithm.
منابع مشابه
Text Categorization and Support Vector Machines
Text categorization is used to automatically assign previously unseen documents to a predefined set of categories. This paper gives a short introduction into text categorization (TC), and describes the most important tasks of a text categorization system. It also focuses on Support Vector Machines (SVMs), the most popular machine learning algorithm used for TC, and gives some justification why ...
متن کاملUniversit at Dortmund Fachbereich Informatik Lehrstuhl Viii K Unstliche Intelligenz Text Categorization with Support Vector Machines: Learning with Many Relevant Features Text Categorization with Support Vector Machines: Learning with Many Relevant Features
This paper explores the use of Support Vector Machines (SVMs) for learning text classiers from examples. It analyzes the particular properties of learning with text data and identi es, why SVMs are appropriate for this task. Empirical results support the theoretical ndings. SVMs achieve substantial improvements over the currently best performing methods and they behave robustly over a variety o...
متن کاملText Categorization with Support Vector Machines: Learning with Many Relevant F Eatures Text Categorization with Support Vector Machines: Learning with Many Relevant F Eatures
This paper explores the use of Support Vector Machines (SVMs) for learning text classiers from examples. It analyzes the particular properties of learning with text data and identi es, why SVMs are appropriate for this task. Empirical results support the theoretical ndings. SVMs achieve substantial improvements over the currently best performing methods and they behave robustly over a variety o...
متن کاملSupport Vector Machines for Text Categorization Based on Latent Semantic Indexing
Text Categorization(TC) is an important component in many information organization and information management tasks. Two key issues in TC are feature coding and classifier design. In this paper Text Categorization via Support Vector Machines(SVMs) approach based on Latent Semantic Indexing(LSI) is described. Latent Semantic Indexing[1][2] is a method for selecting informative subspaces of featu...
متن کاملUsing Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization
This paper investigates the use of conceptbased representations for text categorization. We introduce a new approach to create concept-based text representations, and apply it to a standard text categorization collection. The representations are used as input to a Support Vector Machine classifier, and the results show that there are certain categories for which concept-based representations co...
متن کامل