A Method to extend Existing Document Clustering Procedures in order to include Relational Information
نویسنده
چکیده
We consider the problem of clustering nodes in a graph, where each node has also internal content (e.g., the Web, where nodes are web pages). In this context we can distinguish two kinds of information: content information and structural information. Standard clustering methods use content information only, while graph clustering methods are usually based on the graph structure. Relatively recently, researchers have proposed to combine both types of information. In this paper we propose a very simple, yet hitherto unexplored, method for doing this by extending existing clustering procedures that use content information.
منابع مشابه
خوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملScalable Clustering of Documents with Multiple Membership
Document clustering has recently garnered a large amount of attention from the IR, data mining, and machine learning research communities as an effective way of not only organizing textual information, but also for discovering interesting patterns in that information. Most existing methods, however, suffer from two main drawbacks. First, most clustering algorithms are very restrictive, as docum...
متن کاملEvolutionary User Clustering Based on Time-Aware Interest Changes in the Recommender System
The plenty of data on the Internet has created problems for users and has caused confusion in finding the proper information. Also, users' tastes and preferences change over time. Recommender systems can help users find useful information. Due to changing interests, systems must be able to evolve. In order to solve this problem, users are clustered that determine the most desirable users, it pa...
متن کامل