A density based clustering approach to distinguish between web robot and human requests to a web server
نویسندگان
چکیده مقاله:
Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data sets of web visitors in a reasonable amount of time. Moreover, this technique should be insensitive to the ordering of instances and produce deterministic accurate results. Therefore, this paper presents a density-based clustering approach using Density-Based Spatial Clustering of Applications with Noises (DBSCAN), to classify web visitors of two real large data sets. We propose two new features based on the behavioral patterns of visitors to describe them. What's more, we consider 12 common features and use the significance of the difference test (T-test) to reduce the dimensions and overcome one of the disadvantages of DBSCAN. Based on the supervised evaluation metrics, the proposed algorithm has the 95% of Jaccard metric and produces two clusters having the entropy and purity rates of 0.024 and 0.97, respectively. Furthermore, from the standpoint of clustering quality and accuracy, the proposed method performs better than state-of-the-art algorithms. Finally, it can be concluded that some known web robots through imitating human users make it difficult to be identified.
منابع مشابه
iranian english learners’ perception and personality: a dual approach to investigating influential factors on willingness to communicate
abstract previous studies on willingness to communicate (wtc) have shown the influence of many individual or situational factors on students’ tendency to engage in classroom communication, in which wtc has been viewed either at the trait-level or situational level. however, due to the complexity of the notion of willingness to communicate, the present study suggests that these two strands are ...
a benchmarking approach to optimal asset allocation for insurers and pension funds
uncertainty in the financial market will be driven by underlying brownian motions, while the assets are assumed to be general stochastic processes adapted to the filtration of the brownian motions. the goal of this study is to calculate the accumulated wealth in order to optimize the expected terminal value using a suitable utility function. this thesis introduced the lim-wong’s benchmark fun...
15 صفحه اولa frame semantic approach to the study of translating cultural scripts in salingers franny and zooey
the frame semantic theory is a nascent approach in the area of translation studies which goes beyond the linguistic barriers and helps us to incorporate cognitive and cultural factors to the study of translation. based on rojos analytical model (2002b), which centered in the frames or knowledge structures activated in the text, the present research explores the various translation problems that...
15 صفحه اولa new type-ii fuzzy logic based controller for non-linear dynamical systems with application to 3-psp parallel robot
abstract type-ii fuzzy logic has shown its superiority over traditional fuzzy logic when dealing with uncertainty. type-ii fuzzy logic controllers are however newer and more promising approaches that have been recently applied to various fields due to their significant contribution especially when the noise (as an important instance of uncertainty) emerges. during the design of type- i fuz...
15 صفحه اولExpert Discovery: A web mining approach
Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...
متن کاملModeling web requests: a multifractal approach
World Wide Web (WWW) caching is used to improve network latency and bandwidth usage by storing previously requested files in a cache. Ideally, the cache replacement policy should account for the intrinsic characteristics of WWW traffic, which include temporal locality, spatial locality, and popularity. In this paper, we accurately capture these three characteristics in a stochastic model, which...
متن کاملمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ذخیره در منابع من قبلا به منابع من ذحیره شده{@ msg_add @}
عنوان ژورنال
دوره 6 شماره 1
صفحات 77- 89
تاریخ انتشار 2014-01-01
با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023