Mining Web Activity Logs
نویسنده
چکیده
Web mining has been a major research topic during the last decade. Traditional sources of information on the Web include the text of web pages, the structure of the pages as organized by the web page creators, as well as the link structure of the Web. Another emerging source of information is the tags for web pages as collected and organized by collaborative tagging systems (e.g. del.icio.us, StumbleUpon). Search engines are interested in organizing all the information provided by these sources so that they are able to respond accurately to different search tasks. However, the ample use of web search engines by billions of people worldwide has created another interesting source of information. The search and browsing activity of these users implicitly captures human knowledge and reveals the most interesting places on the web. By looking at the search queries people issue, we can extract a succinct, user-generated summary for pages appearing in the search results. For example, such a summary for Google’s main page could be: google, search engine, larry page, pagerank. These summaries tend to be accurate given the fact that web users have been trained all these years to formulate queries with an emphasis on selecting the least ambiguous and most meaningful words. In a way, in addition to requesting new information, search users are also revealing new knowledge (that perhaps does not even exist yet on the web) through the keywords they select for their queries and the navigational paths they follow in response. The paths users follow are also implicit votes on the interesting parts on the Web. Search engines are the first to record the users’ search history in order to be able to analyze it and improve their services in the future. These query logs contain the terms users issue in a search engine along with the search results people click on in response. In addition, through browser toolbars, search engines can record web activity logs that contain the history of pages a user visits. Although web activity and query logs contain invaluable information for the search engines, it is also the web users who can explicitly benefit from such data. However, we have to identify meaningful ways to mine the web activity logs and make them available back to the web users. The primary goal of my research has been to investigate novel uses of web activity logs in formulating and solving novel problems related to the Web. The research on query and web activity logs that we have performed, with collaborators from Yahoo! and Microsoft Research, apply to problems related to sponsored search, recommendation systems, web information extraction systems, and web navigation systems. Section 1 outlines the research I have performed to date, while Section 2 describes my current research and plans for the future.
منابع مشابه
On Analyzing Web Log Data: A Parallel Sequence Mining Algorithm
Activities at enterprise-class web sites, as well as other web sites, are usually recorded via web logs. Collected logs consist of records from many click streams, which are defined as collections of hits (requests) from a specific user during a specific session. Using web logs is the most common way of collecting click stream data at this time. Thus data warehouses are built based on the cruci...
متن کاملEfficient Frequent Pattern Mining on Web Logs
Mining frequent patterns fromWeb logs is an important data mining task. Candidate-generation-and-test and pattern-growth are two representative frequent pattern mining approaches. We have conducted extensive experiments on real world Web log data to analyse the characteristics of Web logs and the behaviours of these two approaches on Web logs. To improve the performance of current algorithms on...
متن کاملOnline and Incremental Mining of Separately-Grouped Web Access Logs
The rising popularity of electronic commerce makes data mining an indispensable technology for business competitiveness. The World Wide Web provides abundant raw data in the form of web access logs, web transaction logs and web user profiles. Without data mining tools, it is impossible to make any sense of such massive data. In this paper, we focus on web usage mining because it deals most appr...
متن کاملAn Efficient Algorithm for Data Cleaning of Web Logs with Spider Navigation Removal
The World Wide Web is growing massively larger with the exponential growth of websites providing the user with heaps of information. Text files called as web logs are used to store the clicks of a user whenever a user visits a website. Web usage mining is a stream of web mining that involves the applications of mining techniques to be applied on the server logs containing the user clickstreams....
متن کاملWeb Usage Mining: users' navigational patterns extraction from web logs using ant-based clustering method
Web Usage Mining is the process of applying data mining techniques to the discovery of usage patterns from data extracted from Web Log files. It mines the secondary data (web logs) derived from the users' interaction with the web pages during certain period of Web sessions. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. In this paper, w...
متن کاملSafelog: Supporting Web Search and Mining by Differentially-Private Query Logs
Query logs can be very useful for advancing web search and web mining research. Since these web query logs contain private, possibly sensitive data, they need to be effectively anonymized before they can be released for research use. Anonymization of query logs differs from that of structured data since they are generated based on natural language and the vocabulary (domain) is infinite. This u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009