Bipartite Graph Sampling Methods for Sampling Recommendation Data
نویسنده
چکیده
Sampling is the common practice involved in academic and industry efforts on recommendation algorithm evaluation and selection. Experimental analysis often uses a subset of the entire useritem interaction data available in the operational recommender system, often derived by including all transactions associated with a subset of uniformly randomly selected users. Our paper formally studies the sampling problem for recommendation to understand to what extent population-based algorithm evaluation results correspond with sample-based results using different sampling methods. We use a bipartite graph to represent the key input data of user-item interaction for recommendation algorithms and build on the literature on unipartite graph sampling to develop sampling methods for our context of bipartite graph sampling. We also developed several metrics for assessing the quality of a given sample, including performance recovery and ranking recovery measures for assessing both single-sample and multiple-sample recovery performances. Based on the empirical results from two real-world datasets we provide some general recommendations for sampling for recommendation algorithm evaluation.
منابع مشابه
Sampling Online Social Networks by Random Walk with Indirect Jumps
Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation accuracy. Random walk with jumps (RWwJ) can address the slow mixing problem but it is inapplicable if the graph does not support uniform vertex sampling (UNI). In ...
متن کاملEstimating Node Similarity by Sampling Streaming Bipartite Graphs
Bipartite graph data increasingly occurs as a stream of edges that represent transactions, e.g., purchases by retail customers. Applications such as recommender systems employ neighborhood-based measures of node similarity, such as the pairwise number of common neighbors (CN) and related metrics. While the number of node pairs that share neighbors is potentially enormous, in real-word graphs on...
متن کاملEfficient Sampling for Bipartite Matching Problems
Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce mode...
متن کاملVertex-Context Sampling for Weighted Network Embedding
Network embedding methods have garnered increasing aention because of their eectiveness in various information retrieval tasks. e goal is to learn low-dimensional representations of vertexes in an information network and simultaneously capture and preserve the network structure. Critical to the performance of a network embedding method is how the edges/vertexes of the network is sampled for ...
متن کاملPerfect Matchings in Õ(n) Time in Regular Bipartite Graphs
We consider the well-studied problem of finding a perfect matching in d-regular bipartite graphs with 2n vertices and m = nd edges. While the best-known algorithm for general bipartite graphs (due to Hopcroft and Karp) takes O(m √ n) time, in regular bipartite graphs, a perfect matching is known to be computable in O(m) time. Very recently, the O(m) bound was improved to O(min{m, n 2.5 lnn d })...
متن کامل