Incorporating Prior Knowledge into Task Decomposition for Large-Scale Patent Classification
نویسندگان
چکیده
With the adoption of min-max-modular support vector machines (SVMs) to solve large-scale patent classification problems, a novel, simple method for incorporating prior knowledge into task decomposition is proposed and investigated. Two kinds of prior knowledge described in patent texts are considered: time information, and hierarchical structure information. Through experiments using the NTCIR-5 Japanese patent database, patents are found to have time-varying features that considerably affect classification. The experimental results demonstrate that applying min-max modular SVMs with the proposed method gives performance superior to that of conventional SVMs in terms of training time, generalization accuracy, and scalability.
منابع مشابه
Áòòóöôóööøøòò Ôööóö Òóûððððð Òøó Ðððöòòòò Ý Úúúúòò Øöööòòòò Øø ¾ Ååò¹ññü Ñóóùððö Òòøûóöö
In most large-scale real-world pattern classification problems, there is always some explicit information besides given training data, namely prior knowledge, with which the training data are organized. In this paper, we proposed a framework for incorporating this kind of prior knowledge into the training of min-max modular (M) classifier to improve learning performance. In order to evaluate th...
متن کاملLearning from imbalanced data sets with a Min-Max modular support vector machine
Imbalanced data sets have significantly unequal distributions between classes. This between-class imbalance causes conventional classification methods to favor majority classes, resulting in very low or even no detection of minority classes. A Min-Max modular support vector machine (M-SVM) approaches this problem by decomposing the training input sets of the majority classes into subsets of sim...
متن کاملGender Classification Using a Min-Max Modular Support Vector Machine with Incorporating Prior Knowledge
Gender classification based on facial images is a large-scale, complicated two-class classification problem by nature. The reason is that few knowledge is known about the mechanism of human beings discriminating male and female from facial images, and a large number of various facial images is required to train the gender classifier. This paper presents a method for dealing with the gender clas...
متن کاملIncorporating Linguistic Knowledge for Learning Distributed Word Representations
Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge...
متن کاملA Parallel and Modular Pattern Classification Framework for Large-Scale Problems
The number of samples that are available on the internet to train pattern classifiers is increasing rapidly, while traditional pattern classification techniques based on a single computer system are powerless to process these large-scale data sets. This chapter presents a parallel and modular pattern classification framework for coping with large-scale pattern classification problems. The propo...
متن کامل