Search Framework for Mining Classi cation

نویسندگان

Shashi Shekhar

Vipin Kumar

M. Ganesh

Jaideep Srivastava

چکیده

Classiication{rule{learning task is presented as a search process of nding a classiication{ decision tree that meets users' preferences and requirements. Users can control the eeciency of the mining process and the quality of the nal decision tree through the search parameters. This search framework allows users to easily adapt to diierent domains and diierent sets of data by modifying diierent search parameters. The mining process starts with a speciic set of values for the search parameters and the process can be repeated with diierent search parameters values until a satisfactory result is obtained. This framework also allows the development of new algorithms. A set of search parameters that is frequently and successfully used can deene a new algorithm. We present two new algorithms developed in this search framework and compare them to well{known algorithms. BF uses best{{rst ordering to expand the frontier nodes of the decision trees, instead of the conventional depth{{rst or breadth{{rst criteria. CDP+ dynamically adjusts the depth pruning criteria for a node. Experimental results show that BF has a new capability to guide the construction of decision trees based on the overall error{rate criteria. In addition, CDP+ often outperforms several decision{tree learning algorithms in error rate and number of nodes generated. the content of which does not necessarily reeect the position or the policy of the government, and no oocial endorsement should be inferred.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An E cient Two Step Method for Classi cation of Spatial Data

Spatial data mining i e discovery of interest ing implicit knowledge in spatial databases is a highly demanding eld because very large amounts of spatial data have been collected in various applications ranging from remote sensing to geographical information systems GIS computer cartography environmental assessment and planning etc In this paper an e cient method for building decision trees for...

متن کامل

Text Mining in Social Networks

Social networks are rich in various kinds of contents such as text and multimedia. The ability to apply text mining algorithms effectively in the context of text data is critical for a wide variety of applications. Social networks require text mining algorithms for a wide variety of applications such as keyword search, classi cation, and clustering. While search and classi cation are well known...

متن کامل

Comparison of genetic algorithm based prototype selection schemes

Prototype selection is the process of "nding representative patterns from the data. Representative patterns help in reducing the data on which further operations such as data mining can be carried out. The current work discusses computation of prototypes using medoids [1], leaders [2] and distance based thresholds. After "nding the initial set of prototypes, the optimal set is found by means of...

متن کامل

Algorithms and Applications for Universal Quanti cation in Relational

Queries containing universal quanti cation are used in many applications, including business intelligence applications and in particular data mining. We present a comprehensive survey of the structure and performance of algorithms for universal quanti cation. We introduce a framework that results in a complete classi cation of input data for universal quanti cation. Then we go on to identify th...

متن کامل

SPRINT: A Scalable Parallel Classi er for Data Mining

Classi cation is an important data mining problem. Although classi cation is a wellstudied problem, most of the current classication algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classi cation algorithm, called SPRINT that removes all of the memo...

متن کامل

SLIQ: A Fast Scalable Classi er for Data Mining

Classi cation is an important problem in the emerging eld of data mining. Although classi cation has been studied extensively in the past, most of the classi cation algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classier and presents the design of SLIQ, a new classi er. SL...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1996

Search Framework for Mining Classi cation

نویسندگان

چکیده

منابع مشابه

An E cient Two Step Method for Classi cation of Spatial Data

Text Mining in Social Networks

Comparison of genetic algorithm based prototype selection schemes

Algorithms and Applications for Universal Quanti cation in Relational

SPRINT: A Scalable Parallel Classi er for Data Mining

SLIQ: A Fast Scalable Classi er for Data Mining

عنوان ژورنال:

اشتراک گذاری