Collective Factorization for Relational Data: An Evaluation on the Yelp Datasets
نویسندگان
چکیده
Matrix factorization has found incredible success and widespread application as a collaborative filtering based approach to recommendations. Unfortunately, incorporating additional sources of incomplete and noisy evidence is quite difficult to achieve in such models, however, is often crucial for obtaining further gains in accuracy. For example, in the Yelp datasets, additional information about businesses from reviews, categories, and attributes should be leveraged for predicting ratings, even though these may be inaccurate and partially-observed. Instead of creating customized solutions that are specific to the types of evidences, in this paper we present a generic approach to factorization of relational data that collectively models all the relations in the database. By learning a set of factors that are shared across all the relations, the model is able to incorporate observed information from all the relations, while also predicting all the relations of interest. Our evaluation on four Yelp datasets demonstrates effective utilization of additional information for held-out rating and attribute prediction, but further, we present accurate models even for cold-start businesses for which we do not observe any ratings or attributes. We also present joint visualizations of word, category, and attribute factors, demonstrating learned dependencies between them that are not directly observed in the data.
منابع مشابه
A Three-Way Model for Collective Learning on Multi-Relational Data
Relational learning is becoming increasingly important in many areas of application. Here, we present a novel approach to relational learning based on the factorization of a three-way tensor. We show that unlike other tensor approaches, our method is able to perform collective learning via the latent components of the model and provide an efficient algorithm to compute the factorization. We sub...
متن کاملCollectively Embedding Multi-Relational Data for Predicting User Preferences
Matrix factorization has found incredible success and widespread application as a collaborative filtering based approach to recommendations. Unfortunately, incorporating additional sources of evidence, especially ones that are incomplete and noisy, is quite difficult to achieve in such models, however, is often crucial for obtaining further gains in accuracy. For example, additional information...
متن کاملCollective vs Independent Classification in Statistical Relational Learning
Statistical Relational Learning (SRL) addresses the problem of performing probabilistic inference on data instances that are correlated. Collective classification is an important SRL task, in which related data instances are classified simultaneously as opposed to independently which is done in independent Machine Learning. In several studies conducted in the last decade, it has been shown that...
متن کاملLanguage and Identity in the Iranian Context: The Impact of Identity Aspects on EFL Learners' Achievement
Identity orientations refer to the relative importance that individuals place on various identity attributes or characteristics such as race, religion, culture and language when constructing their self-definitions (Chew, 2007; Cheek, 1989). Accordingly, the present study aims at identifying the impact of identity aspects on the Iranian learners' English language achievements at Shiraz Universit...
متن کاملDeep Collective Inference
Collective inference is widely used to improve classification in network datasets. However, despite recent advances in deep learning and the successes of recurrent neural networks (RNNs), researchers have only just recently begun to study how to apply RNNs to heterogeneous graph and network datasets. There has been recent work on using RNNs for unsupervised learning in networks (e.g., graph clu...
متن کامل