Uncertainty-Based Noise Reduction and Term Selection in Text Categorization
نویسندگان
چکیده
منابع مشابه
Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملProject Full Title People and Knowledge Cross Lingual Information Gathering Security (distribution Level) Public Contractual Date of Delivery N/a Deliverable Name Uncertainty and Term Selection in Text Categorization
متن کامل
An analysis on Frequency of terms for Text Categorization
Preliminary results on a way to reduce terms for text categorization are presented. We have used the transition point; a frequency which splits the words of a text into high frequency words and low frequency words. Thresholds outcoming from document frequency of terms, Information Gain and χ were tested in combination with the transition point. A text categorization experiment based on Rocchio’...
متن کاملImproving Feature Selection Techniques for Machine Learning
As a commonly used technique in data preprocessing for machine learning, feature selection identifies important features and removes irrelevant, redundant or noise features to reduce the dimensionality of feature space. It improves efficiency, accuracy and comprehensibility of the models built by learning algorithms. Feature selection techniques have been widely employed in a variety of applica...
متن کاملA Comparative Study on Feature Selection in Text Categorization
This paper is a comparative study of feature selection methods in statistical learning of text categorization The focus is on aggres sive dimensionality reduction Five meth ods were evaluated including term selection based on document frequency DF informa tion gain IG mutual information MI a test CHI and term strength TS We found IG and CHI most e ective in our ex periments Using IG thresholdin...
متن کامل