Mutual Information and Redundancy for Categorical Data
نویسندگان
چکیده
منابع مشابه
K-ANMI: A Mutual Information Based Clustering Algorithm for Categorical Data
Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present kANMI, a new efficient algorithm for clustering categorical data. The k-ANMI algorithm works in a way that is similar to the popular kmeans algorithm, and the goodness of clustering in each step is evaluated using a mutual information based criterion (namely, avera...
متن کاملA Crash Course on Shannon's Mutual Information for Categorical Data Analysis
Here a general form of a data set from the eld of information retrieval. A corpus of documents (scienti c papers, say) contains documents already labeled into topics (e.g physics, bio, math), a list of keyword, and a count matrix M where Mx,y is the number of appearances (appropriately normalized to correct for di erences in document length) of word x in any document of topic y. You can nd a fe...
متن کاملG-ANMI: A mutual information based genetic clustering algorithm for categorical data
Identification of meaningful clusters from categorical data is one key problem in data mining. Recently, Average Normalized Mutual Information (ANMI) has been used to define categorical data clustering as an optimization problem. To find globally optimal or near-optimal partition determined by ANMI, a genetic clustering algorithm (G-ANMI) is proposed in this paper. Experimental results show tha...
متن کاملFeature selection based on mutual information and redundancy-synergy coefficient.
Mutual information is an important information measure for feature subset. In this paper, a hashing mechanism is proposed to calculate the mutual information on the feature subset. Redundancy-synergy coefficient, a novel redundancy and synergy measure of features to express the class feature, is defined by mutual information. The information maximization rule was applied to derive the heuristic...
متن کاملQuantifying multivariate redundancy with maximum entropy decompositions of mutual information
Williams and Beer (2010) proposed a nonnegative mutual information decomposition, based on the construction of redundancy lattices, which allows separating the information that a set of variables contains about a target variable into nonnegative components interpretable as the unique information of some variables not provided by others as well as redundant and synergistic components. However, t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Communications for Statistical Applications and Methods
سال: 2006
ISSN: 2287-7843
DOI: 10.5351/ckss.2006.13.2.297