Web Users Session Analysis Using DBSCAN and Two Phase Utility Mining Algorithms
نویسنده
چکیده
One of the important issues in data mining is the interestingness problem. Typically, in a data mining process, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, utility measures have been used to reduce the patterns prior to presenting them to the user. A frequent itemset only reflects the statistical correlation between items, and it does not reflect the semantic significance of the items. This proposed approach uses a utility based itemset mining approach to overcome this limitation. This proposed system first uses Dbscan clustering algorithm which identifies the behavior of the users page visits, order of occurrence of visits. After applying the clustering technique High Two phase utility mining algorithm is applied, aimed at finding itemsets that contribute high utility.Mining web access sequences can discover very useful knowledge from web logs with broad applications. Mining useful Web path traversal patterns is a very important research issue in Web technologies. Knowledge about the frequent Web path traversal patterns enables us to discover the most interesting Websites traversed by the users. However, considering only the binary (presence/absence) occurrences of the Websites in the Web traversal paths, real world scenarios may not be reflected. Therefore, if we consider the time spent by each user as a utility value of a website, more interesting web traversal paths can be discovered using proposed two-phase algorithm. User page visits are sequential in nature. In this paper MSNBC web navigation dataset is used to compare the efficiency and performance in web usage mining is finding the groups which share common interests General Terms Web session mining, log analysis.
منابع مشابه
Designing a System for Trend Analysis of Users in Website Surfing in Iran Using Data Mining and Text Mining Algorithms
Background and Aim: As of the entrance of web surfing to the lifestyle of a vast majority of people in the society and the need for a more accurate social and cultural policy making in the field, authors intended to analyze the behavior of the society users in viewing different websites so as to help politicians and practitioners. Methods: Design science research method is used in this research...
متن کاملQuantitative Evaluation of Performance and Validity Indices for Clustering the Web Navigational Sessions
Clustering techniques are widely used in “Web Usage Mining” to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are...
متن کاملA density based clustering approach to distinguish between web robot and human requests to a web server
Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data ...
متن کاملDiscovery of Web Usage Profiles Using Various Clustering Techniques
The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely...
متن کاملG-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering
With the advent of Web 2.0, we see a new and differentiated scenario: there is more data than that can be effectively analyzed. Organizing this data has become one of the biggest problems in Computer Science. Many algorithms have been proposed for this purpose, highlighting those related to the Data Mining area, specifically the clustering algorithms. However, these algorithms are still a compu...
متن کامل