Search Framework for Mining Classi cation
نویسندگان
چکیده
Classiication{rule{learning task is presented as a search process of nding a classiication{ decision tree that meets users' preferences and requirements. Users can control the eeciency of the mining process and the quality of the nal decision tree through the search parameters. This search framework allows users to easily adapt to diierent domains and diierent sets of data by modifying diierent search parameters. The mining process starts with a speciic set of values for the search parameters and the process can be repeated with diierent search parameters values until a satisfactory result is obtained. This framework also allows the development of new algorithms. A set of search parameters that is frequently and successfully used can deene a new algorithm. We present two new algorithms developed in this search framework and compare them to well{known algorithms. BF uses best{{rst ordering to expand the frontier nodes of the decision trees, instead of the conventional depth{{rst or breadth{{rst criteria. CDP+ dynamically adjusts the depth pruning criteria for a node. Experimental results show that BF has a new capability to guide the construction of decision trees based on the overall error{rate criteria. In addition, CDP+ often outperforms several decision{tree learning algorithms in error rate and number of nodes generated. the content of which does not necessarily reeect the position or the policy of the government, and no oocial endorsement should be inferred.
منابع مشابه
An E cient Two Step Method for Classi cation of Spatial Data
Spatial data mining i e discovery of interest ing implicit knowledge in spatial databases is a highly demanding eld because very large amounts of spatial data have been collected in various applications ranging from remote sensing to geographical information systems GIS computer cartography environmental assessment and planning etc In this paper an e cient method for building decision trees for...
متن کاملText Mining in Social Networks
Social networks are rich in various kinds of contents such as text and multimedia. The ability to apply text mining algorithms effectively in the context of text data is critical for a wide variety of applications. Social networks require text mining algorithms for a wide variety of applications such as keyword search, classi cation, and clustering. While search and classi cation are well known...
متن کاملComparison of genetic algorithm based prototype selection schemes
Prototype selection is the process of "nding representative patterns from the data. Representative patterns help in reducing the data on which further operations such as data mining can be carried out. The current work discusses computation of prototypes using medoids [1], leaders [2] and distance based thresholds. After "nding the initial set of prototypes, the optimal set is found by means of...
متن کاملAlgorithms and Applications for Universal Quanti cation in Relational
Queries containing universal quanti cation are used in many applications, including business intelligence applications and in particular data mining. We present a comprehensive survey of the structure and performance of algorithms for universal quanti cation. We introduce a framework that results in a complete classi cation of input data for universal quanti cation. Then we go on to identify th...
متن کاملSPRINT: A Scalable Parallel Classi er for Data Mining
Classi cation is an important data mining problem. Although classi cation is a wellstudied problem, most of the current classication algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classi cation algorithm, called SPRINT that removes all of the memo...
متن کاملSLIQ: A Fast Scalable Classi er for Data Mining
Classi cation is an important problem in the emerging eld of data mining. Although classi cation has been studied extensively in the past, most of the classi cation algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classier and presents the design of SLIQ, a new classi er. SL...
متن کامل