Short Text Classification Based on Improved ITC
نویسندگان
چکیده
منابع مشابه
Short Text Classification Based on Improved ITC
The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conventional TFIDF and the superiority of the ITC compared with the TFIDF, then we conclude the flaws of the conventional ITC algorithm, and then we present an improved ITC feature selec...
متن کاملImproved Graph Based K-NN Text Classification
This paper presents an improved graph based k-nn algorithm for text classification. Most of the organization are facing problem of large amount of unorganized data. Most of the existing text classification techniques are based on vector space model which ignores the structural information of the document which is the word order or the co-occurrences of the terms or words. In this paper we have ...
متن کاملShort Text Classification Improved by Learning Multi-Granularity Topics
Understanding the rapidly growing short text is very important. Short text is different from traditional documents in its shortness and sparsity, which hinders the application of conventional machine learning and text mining algorithms. Two major approaches have been exploited to enrich the representation of short text. One is to fetch contextual information of a short text to directly add more...
متن کاملAn Improved Text Classification Method Based on Gini Index
In text classification, the purity of the Gini index can be used. When purity value is greater, the characteristic of the information contained in the attribute is higher, and the feature distinguishing capability is stronger. But using the Gini purity formula on feature weight, the classification result is not very good, one of the main reasons is those rare words only appearing in one categor...
متن کاملAn Improved KNN Text Classification Algorithm Based on Clustering
The traditional KNN text classification algorithm used all training samples for classification, so it had a huge number of training samples and a high degree of calculation complexity, and it also didn’t reflect the different importance of different samples. In allusion to the problems mentioned above, an improved KNN text classification algorithm based on clustering center is proposed in this ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Computer and Communications
سال: 2013
ISSN: 2327-5219,2327-5227
DOI: 10.4236/jcc.2013.14004