Studying SVM Method's Scalability Using Text Documents
نویسندگان
چکیده
In the last years the quantity of text documents is increasing continually and automatic document classification is an important challenge. In the text document classification the training step is essential in obtaining good results. The quality of learning depends on the dimension of the training data. When working with huge learning data sets, problems regarding the training time that increases exponentially are occurring. In this paper we are presenting a method that allows working with huge data sets into the training step without increasing exponentially the training time and without significantly decreasing the classification accuracy.
منابع مشابه
Electronic Document Classification Using Support Vector Machine - An Application for E-Learning
Appplication of the machine learning techniques to embed adaptivity in the E-Learning frameworks is receiving considerable attention. Text Classification, or the task of automatically assigning semantic categories to natural language text, has therefore become one of the key methods for organizing digital content.Reports on SVM have mainly focussed on the theory or conceptual application of the...
متن کاملComparative Assessment of the Performance of Three WEKA Text Classifiers Applied to Arabic Text
This research is conducted in order to compare the performance of three known text classification techniques namely, Support Vector Machine (SVM) classifier, Naïve Bayes (NB) classifier, and C4.5 Classifier. Text classification aims to automatically assign the text to a predefined category based on linguistic features, and content. These three techniques are compared using a set of Arabic text ...
متن کاملFISA: Feature-Based Instance Selection for Imbalanced Text Classification
Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative train...
متن کاملAspects concerning on the SVM Method’s Scalability
In the last years the quantity of text documents is increasing continually and automatic document classification is an important challenge. In the text document classification the training step is essential in obtaining a good classifier. The quality of learning depends on the dimension of the training data. When working with huge learning data sets, problems regarding the training time that in...
متن کاملClassification of Text Documents Based on Minimum System Entropy
In this paper, we describe a new approach to classification of text documents based on the minimization of system entropy, i.e., the overall uncertainty associated with the joint distribution of words and labels in the collection. The classification algorithm assigns a class label to a new document in such a way that its insertion into the system results in the maximum decrease (or least increa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Scalable Computing: Practice and Experience
دوره 9 شماره
صفحات -
تاریخ انتشار 2008