Efficient Malicious URL based on Feature Classification
نویسندگان
چکیده
Deceitful and malicious web sites pretense significant danger to desktop security, integrity and privacy. Malicious web pages that use drive-by download attacks or social engineering techniques to install unwanted software on a user‘s computer have become the main opportunity for the proliferation of malicious code. Detection of malicious URL has become difficult because of the phishing campaigns and the efforts to avoid blacklists. To look for malicious URL, the first step is usually to gather URLs that are live on the Internet. Then different algorithms are applied to detect malicious URL. This paper is about classifying URL based on features using machine learning techniques OneR, ZeroR, and Random Forest. Keywords—. Machine Learning, Feature Extraction, Benign, Malicious, Web Pages, Classification Module, Attacks. INTRODUCTION The internet has become the medium of option for public to search for information, conduct business, and enjoy entertainment. At the same time, the internet turns out to be the most important stage used by miscreants to attack users. The most commonly used example is drive by download attack. In this attack, attackers insert different modes of attack in the web pages to which malicious URLs direct and once the victim clicks on a malicious URL, they are taken to that web page without notice. Then the attacker may steal any of the victim‘s information that is saved on the host computer, which may lead to grave financial loss. When malicious URLs are sent by friends, victims are more likely to click them. In addition to drive-by-download exploits, attackers also use social engineering to trick victims into installing or running un trusted software. As an example, consider a webpage that asks users to install a fake video player that is presumably necessary to show a video (when, in fact, it is a malware binary). Another example includes fake anti-virus programs. These programs are expanded by web pages that alert users into thinking that their machine is infected with malware, alluring them to download and execute an actual piece of malware as a remedy to the claimed infection. The web is growing rapidly and is a very large place, in which new pages (both begnin and malicious) are added at formidable place. There has been lot of changes and phases in the history of malicious software since it has been exposed and detected in hosts and networks, preliminary from virus which is a self-Replicating adware but not self-transporting moving to worm, which is a selfreplicating and selftransporting and going more for other. The figure of malware attack is increasing sharply with the rapid increase in complexity and interconnection of rising information systems. When the user clicks on the URL it is most likely to become a target. To prevent users from visiting such URL much may be malicious or contain illegal content, large amount of research generated by the security industry is done. According to one study by the Gartner Group [McCall 2007], damage caused by the phishing in the United States is $3.2 billion loss in 2007, amid 3.6 million victims lessening for the attacks, a enormous raise from the 2.3 million the year previous to. Moore et al [Moore and Clayton 2007] provided details that the loss suffered by the consumers and businesses in 2007 in the US unaccompanied was about $2 billion. A major percentage of those losses were basis by one mainly infamous group, called as the ―rock phish gang‖ that uses toolkits to create a large number of unique phishing URLs, putting more pressure on the correctness and precision of blacklist-based antiphishing techniques. New, previously unseen malicious executables, polymorphic malicious executables using encryption and metamorphic malicious executables adopting obfuscation techniques are more complex and difficult to detect. At present, most commonly used malware detection software make use of signature-based method and the heuristic based method to identify threats. International Journal of Engineering Research and General Science Volume 3, Issue 3, Part-2 , May-June, 2015 ISSN 2091-2730 114 www.ijergs.org Signatures are strings of bytes which are short and exclusive to the programs. There use is to recognize scrupulous threats in executable files, records of boot, or memory. The disadvantage, this signature based method is not effective next to customized and unidentified malicious executables this is due to the signature extraction and generation process. Heuristic-based method is more complex than signature based detection techniques, the disadvantage of this method is that time consuming and still fails to detect new malicious executables. The main outcomes of malicious content can be broadly grouped into the following three categories: _ Phishing _ Deceptive advertising _ Computer infection for unauthorized use
منابع مشابه
Feature-based Malicious URL and Attack Type Detection Using Multi-class Classification
Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking etc. Existing approaches have focused on binary detection i.e. either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This pa...
متن کاملRandom-Forest-Based Analysis of URL Paths
One of the key sources of spreading malware are malicious web sites – either tricking user to install malware imitating legitimate software or, in the case of various exploit kits, initiating malware installation even without any user action. The most common technique against such web sites is blacklisting. However, it provides little to no information about new sites never seen before. Therefo...
متن کاملA Pattern Recognition Neural Network Model for Detection and Classification of SQL Injection Attacks
Thousands of organisations store important and confidential information related to them, their customers, and their business partners in databases all across the world. The stored data ranges from less sensitive (e.g. first name, last name, date of birth) to more sensitive data (e.g. password, pin code, and credit card information). Losing data, disclosing confidential information or even chang...
متن کاملEfficient Prediction of Cross-Site Scripting Web Pages using Extreme Learning Machine
Malicious code is a way of attempting to acquire sensitive information by sending malicious code to the trustworthy entity in an electronic communication. JavaScript is the most frequently used command language in the web page environment. If the hackers misuse the JavaScript code there is a possibility of stealing the authentication and confidential information about an organization and user. ...
متن کاملURLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection
Malicious URLs host unsolicited content and are used to perpetrate cybercrimes. It is imperative to detect them in a timely manner. Traditionally, this is done through the usage of blacklists, which cannot be exhaustive, and cannot detect newly generated malicious URLs. To address this, recent years have witnessed several efforts to perform Malicious URL Detection using Machine Learning. The mo...
متن کامل