Clustering Web Sessions Using Extended General Pages
نویسندگان
چکیده
We study Web sessions clustering in order to find groups of similar sessions and discover user access patterns on a Web site. We extend the general page concept presented in (Fu, Sandhu and Shih 2000) by including partial document names and dynamic pages, and use an extended general page (EGP) to represent many individual page URLs sharing the same EGP. We present two extensions of a hierarchical clustering algorithm, ROCK (Guha, Rastogi and Shim 2000). One is a notion of EGP count that we add to the session similarity calculation. The other is a goodness threshold we adopt to restrict certain clusters from merging with others. Further, we propose a set of measurements for assessing the results from clustering boolean and categorical data and help users to identify their desired clustering results. In our experiments, we applied the ROCK and the extended ROCK (EROCK) algorithms to cluster a half-month’s Web log from a customer service Web site at HP. The experiment results showed that E-ROCK alleviated a large cluster problem of the ROCK algorithm and improved the performance in intra cluster similarity.
منابع مشابه
Clustering of Web Users Based on Access Patterns
The clustering of the Web users based on their access patterns is studied. Access patterns of the Web users are extracted from Web servers' log les, and then organized into sessions which represent episodes of interaction between Web users and the Web server. Using attributed-oriented induction, the sessions are then generalized according to the page hierarchy which organizes pages according to...
متن کاملClustering Web Sessions by Sequence Alignment
Clustering means grouping similar objects into groups such that objects within a same group bear similarity to each other while objects in different groups are dissimilar to each other. As an important component of data mining, much research on clustering has been conducted in different disciplines. In the context of web mining, clustering could be used to cluster similar clickstreams to determ...
متن کاملUse of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملWeb sessions clustering for behavioral targeting
In this paper we present our on going e ort to compare web sessions clusters based on di erent web sessions representations. Sessions within the same cluster represent common navigation patterns. We assume that users with the same navigation patterns have common interests and motivations at a point in time. Therefore, we represent sessions based on descriptions extracted from the URLs as well a...
متن کاملStudy and Evaluation of user’s behavior in e-commerce Using Data Mining
Data mining has matured as a field of basic and applied research in computer science. The objective of this dissertation is to evaluate, propose and improve the use of some of the recent approaches, architectures and Web mining techniques (collecting personal information from customers) are the means of utilizing data mining methods to induce and extract useful information from Web information ...
متن کامل