Balancing Interpretability and Predictive Accuracy for Unsupervised Tensor Mining

نویسندگان

  • Ishmam Zabir
  • Evangelos E. Papalexakis
چکیده

The PARAFAC tensor decomposition has enjoyed an increasing success in exploratory multi-aspect data mining scenarios. A major challenge remains the estimation of the number of latent factors (i.e., the rank) of the decomposition, which yields high-quality, interpretable results. Previously, we have proposed an automated tensor mining method which leverages a well-known quality heuristic from the field of Chemometrics, the Core Consistency Diagnostic (CORCONDIA), in order to automatically determine the rank for the PARAFAC decomposition. In this work we set out to explore the trade-off between 1) the interpretability/quality of the results (as expressed by CORCONDIA), and 2) the predictive accuracy of the results, in order to further improve the rank estimation quality. Our preliminary results indicate that striking a good balance in that trade-off benefits rank estimation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modifying SO-PMI for Japanese Weblog Opinion Mining by Using a Balancing Factor and Detecting Neutral Expressions

We propose a variation of the SO-PMI algorithm for Japanese, for use in Weblog Opinion Mining. SO-PMI is an unsupervised approach proposed by Turney that has been shown to work well for English. We first used the SO-PMI algorithm on Japanese in a way very similar to Turney’s original idea. The result of this trial leaned heavily toward positive opinions. We then expanded the reference words to ...

متن کامل

Unsupervised Interpretable Pattern Discovery in Time Series Using Autoencoders

We study the use of feed-forward convolutional neural networks for the unsupervised problem of mining recurrent temporal patterns mixed in multivariate time series. Traditional convolutional autoencoders lack interpretability for two main reasons: the number of patterns corresponds to the manually-fixed number of convolution filters, and the patterns are often redundant and correlated. To recov...

متن کامل

Supervised and Unsupervised Data Mining with an Evolutionary Algorithm

This paper describes our current research with RAGA (Rule Acquisition with a Genetic Algorithm). RAGA is a genetic algorithm and genetic programming hybrid that is designed for the tasks of supervised and certain types of unsupervised data mining. Since its initial release we have improved its predictive accuracy and data coverage, as well as its ability to generate more scalable rule hierarchi...

متن کامل

Automatic Unsupervised Tensor Mining with Quality Assessment

A popular tool for unsupervised modelling and mining multi-aspect data is tensor decomposition. In an exploratory setting, where and no labels or ground truth are available how can we automatically decide how many components to extract? How can we assess the quality of our results, so that a domain expert can factor this quality measure in the interpretation of our results? In this paper, we in...

متن کامل

Scalable Boolean Tensor Factorizations using Random Walks

Tensors are becoming increasingly common in data mining, and consequently, tensor factorizations are becoming more and more important tools for data miners. When the data is binary, it is natural to ask if we can factorize it into binary factors while simultaneously making sure that the reconstructed tensor is still binary. Such factorizations, called Boolean tensor factorizations, can provide ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.01147  شماره 

صفحات  -

تاریخ انتشار 2017