Support Tensor Machines for Text Categorization∗

نویسندگان

  • Deng Cai
  • Xiaofei He
  • Ji-Rong Wen
  • Jiawei Han
  • Wei-Ying Ma
چکیده

We consider the problem of text representation and categorization. Conventionally, a text document is represented by a vector in high dimensional space. Some learning algorithms are then applied in such a vector space for text categorization. Particularly, Support Vector Machine (SVM) has received a lot of attentions due to its effectiveness. In this paper, we propose a new classification algorithm called Support Tensor Machine (STM). STM uses Tensor Space Model to represent documents. It considers a document as the second order tensor in Rn1 ⊗Rn2 , where Rn1 and Rn2 are two vector spaces. With tensor representation, the number of parameters estimated by STM is much less than the number of parameters estimated by SVM. Therefore, our algorithm is especially suitable for small sample cases. We compared our proposed algorithm with SVM for text categorization on two standard databases. Experimental results show the effectiveness of our algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Categorization and Support Vector Machines

Text categorization is used to automatically assign previously unseen documents to a predefined set of categories. This paper gives a short introduction into text categorization (TC), and describes the most important tasks of a text categorization system. It also focuses on Support Vector Machines (SVMs), the most popular machine learning algorithm used for TC, and gives some justification why ...

متن کامل

Universit at Dortmund Fachbereich Informatik Lehrstuhl Viii K Unstliche Intelligenz Text Categorization with Support Vector Machines: Learning with Many Relevant Features Text Categorization with Support Vector Machines: Learning with Many Relevant Features

This paper explores the use of Support Vector Machines (SVMs) for learning text classiers from examples. It analyzes the particular properties of learning with text data and identi es, why SVMs are appropriate for this task. Empirical results support the theoretical ndings. SVMs achieve substantial improvements over the currently best performing methods and they behave robustly over a variety o...

متن کامل

Text Categorization with Support Vector Machines: Learning with Many Relevant F Eatures Text Categorization with Support Vector Machines: Learning with Many Relevant F Eatures

This paper explores the use of Support Vector Machines (SVMs) for learning text classiers from examples. It analyzes the particular properties of learning with text data and identi es, why SVMs are appropriate for this task. Empirical results support the theoretical ndings. SVMs achieve substantial improvements over the currently best performing methods and they behave robustly over a variety o...

متن کامل

Support Vector Machines for Text Categorization Based on Latent Semantic Indexing

Text Categorization(TC) is an important component in many information organization and information management tasks. Two key issues in TC are feature coding and classifier design. In this paper Text Categorization via Support Vector Machines(SVMs) approach based on Latent Semantic Indexing(LSI) is described. Latent Semantic Indexing[1][2] is a method for selecting informative subspaces of featu...

متن کامل

Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization

This paper investigates the use of conceptbased representations for text categorization. We introduce a new approach to create concept-based text representations, and apply it to a standard text categorization collection. The representations are used as input to a Support Vector Machine classifier, and the results show that there are certain categories for which concept-based representations co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006