Large-Scale Community Detection on YouTube for Topic Discovery and Exploration
نویسندگان
چکیده
Detecting coherent and well-connected communities inside large-scale graphs is an interesting problem that can provide useful insight into the graph structure and individual communities. It can also serve as the basis for content exploration and discovery within the graph. Clustering is a popular technique for community detection, however, the two main categories of clustering algorithms, i.e, global and local algorithms, have either scalability or usability issues, e.g, global algorithms do not scale well, and local algorithms may cover only a portion of the graph. Such one-stage algorithms typically optimize one objective function and do not work well in settings where we need to optimize various coverage, coherence and connectivity metrics. In this paper, we study large-scale community detection over a real-world graph composed of millions of YouTube videos. In particular, we present a multi-stage scalable clustering algorithm, combining a pre-processing stage, a local clustering stage, and a post-processing stage to generate clusters of YouTube videos with coherent content. We formalize coverage, coherence, and connectivity metrics and evaluate the quality of the proposed multi-stage clustering algorithms for YouTube videos. We also use extracted entities to attach meaningful labels to our clusters. Our use of local algorithms for global clustering, and its implementation and practical evaluation on such a large scale is a first of its kind.
منابع مشابه
A Multiagent Reinforcement Learning algorithm to solve the Community Detection Problem
Community detection is a challenging optimization problem that consists of searching for communities that belong to a network under the assumption that the nodes of the same community share properties that enable the detection of new characteristics or functional relationships in the network. Although there are many algorithms developed for community detection, most of them are unsuitable when ...
متن کاملKnowledge Discovery from Community-Contributed Multimedia
T he prevalence of imageand videocapturing devices and the advent of media-sharing services such as Flickr and YouTube have drastically increased the volume of community-contributed multimedia. For example, there are reportedly more than four billion images in Flickr and 24 hours of new videos are uploaded to YouTube every minute. Such a vast amount of photos, videos, and music shared via websi...
متن کاملA TWO-STAGE METHOD FOR DAMAGE DETECTION OF LARGE-SCALE STRUCTURES
A novel two-stage algorithm for detection of damages in large-scale structures under static loads is presented. The technique utilizes the vector of response change (VRC) and sensitivities of responses with respect to the elemental damage parameters (RSEs). It is shown that VRC approximately lies in the subspace spanned by RSEs corresponding to the damaged elements. The property is leveraged in...
متن کاملCommunity Detection using a New Node Scoring and Synchronous Label Updating of Boundary Nodes in Social Networks
Community structure is vital to discover the important structures and potential property of complex networks. In recent years, the increasing quality of local community detection approaches has become a hot spot in the study of complex network due to the advantages of linear time complexity and applicable for large-scale networks. However, there are many shortcomings in these methods such as in...
متن کاملCommunity Specific Temporal Topic Discovery from Social Media
Studying temporal dynamics of topics in social media is very useful to understand online user behaviors. Most of the existing work on this subject usually monitors the global trends, ignoring variation among communities. Since users from different communities tend to have varying tastes and interests, capturing communitylevel temporal change can improve the understanding and management of socia...
متن کامل