Column-store: Decision Tree Classification of Unseen Attribute Set
نویسندگان
چکیده
A decision tree can be used for clustering of frequently used attributes to improve tuple reconstruction time in column-stores databases. Due to ad-hoc nature of queries, strongly correlative attributes are grouped together using a decision tree to share a common minimum support probability distribution. At the same time in order to predict the cluster for unseen attribute set, the decision tree may work as a classifier. In this paper we propose classification and clustering of unseen attribute set using decision tree to improve tuple reconstruction time.
منابع مشابه
Steel Buildings Damage Classification by damage spectrum and Decision Tree Algorithm
Results of damage prediction in buildings can be used as a useful tool for managing and decreasing seismic risk of earthquakes. In this study, damage spectrum and C4.5 decision tree algorithm were utilized for damage prediction in steel buildings during earthquakes. In order to prepare the damage spectrum, steel buildings were modeled as a single-degree-of-freedom (SDOF) system and time-history...
متن کاملA New Constructive Induction Approach to Medical Datasets Analysis
The main goal of our research was to prepare a new algorithm for development of decision rule set. These rules are gather from decision table. Every decision table was extended by adding a new descriptive attribute which is obtained by application of a new constructive induction method. Then, set of decision rules were generated for primary and for extended database, respectively. Next, final s...
متن کاملEvaluation of liquefaction potential based on CPT results using C4.5 decision tree
The prediction of liquefaction potential of soil due to an earthquake is an essential task in Civil Engineering. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the...
متن کاملNetwork Anomaly Identification using Supervised Classifier
In this paper we present a clustering based classification method and apply it in network anomaly detection. A set of labeled training data consisting of normal and attack instances are divided into clusters which are represented by their representative profiles consisting of attribute-value pairs for selected subset of attributes. Each category of attack and normal instances are broken down in...
متن کاملA Comparative Study on Decision Rule Induction for incomplete data using Rough Set and Random Tree Approaches
Handling missing attribute values is the greatest challenging process in data analysis. There are so many approaches that can be adopted to handle the missing attributes. In this paper, a comparative analysis is made of an incomplete dataset for future prediction using rough set approach and random tree generation in data mining. The result of simple classification technique (using random tree ...
متن کامل