Phishing website detection using weighted feature line embedding

Authors

Abstract:

The aim of phishing is tracing the users' s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. Moreover, among the available training samples, there are abnormal samples that cause classification error. For instance, it is possible that there are phishing samples with similar features to legitimate ones and vice versa. A supervised feature extraction method, called weighted feature line embedding, is proposed in this paper to solve these problems. The proposed method virtually generates training samples by utilizing the feature line metric. Hence, it can solve the small sample size problem. Moreover, by assigning appropriate weights to each pair of feature points, it corrects the undesirable quality of abnormal samples. The features extracted by our method improve the performance of phishing website detection specially by using small training sets.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

An Image-based Feature Extraction Approach for Phishing Website Detection

Phishing website creators and anti-phishing defenders are in an arms race. Cloning a website is fairly easy and can be automated by any junior programmer. Attempting to recognize numerous phishing links posted in the wild e.g. on social media sites or in email is a constant game of escalation. Automated phishing website detection systems need both speed and accuracy to win. We present a new met...

full text

Iterative Construction of Hierarchical Classifiers for Phishing Website Detection

This article is devoted to a new iterative construction of hierarchical classifiers in SimpleCLI for the detection of phishing websites. Our new construction of hierarchical systems creates ensembles of ensembles in SimpleCLI by iteratively linking a top-level ensemble to another middle-level ensemble instead of a base classifier so that the top-level ensemble can generate a large multilevel sy...

full text

Feature Selection for Improved Phishing Detection

Phishing – a hotbed of multibillion dollar underground economy – has become an important cybersecurity problem. The centralized blacklist approach used by most web browsers usually fails to detect zero-day attacks, leaving the ordinary users vulnerable to new phishing schemes; therefore, learning machine based approaches have been implemented for phishing detection. Many existing techniques in ...

full text

Nearest feature line embedding for face hallucination

A new manifold learning method, called nearest feature line (NFL) embedding, for face hallucination is proposed. While many manifold learning based face hallucination algorithms have been proposed in recent years, most of them apply the conventional nearest neighbour metric to derive the subspace and may not effectively characterise the geometrical information of the samples, especially when th...

full text

Phishing Detection Using Neural Network

The goal of this project is to apply multilayer feedforward neural networks to phishing email detection and evaluate the effectiveness of this approach. We design the feature set, process the phishing dataset, and implement the neural network (NN) systems. We then use cross validation to evaluate the performance of NNs with different numbers of hidden units and activation functions. We also com...

full text

Associative Classification Mining for Website Phishing Classification

-Website phishing is one of the crucial research topics for the internet community due to the massive number of online daily transactions. The process of predicting the phishing activity for a website is a typical classification problem in data mining where different website’s features such as URL length, prefix and suffix, IP address, etc., are used to discover concealed correlations (knowledg...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 9  issue 2

pages  49- 61

publication date 2017-07-31

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023