Malware Detection using Classification of Variable-Length Sequences
نویسندگان
چکیده مقاله:
In this paper, a novel method based on the graph is proposed to classify the sequence of variable length as feature extraction. The proposed method overcomes the problems of the traditional graph with variable length of data, without fixing length of sequences, by determining the most frequent instructions and insertion the rest of instructions on the set of “other”, save speed and memory. According to features and the similarities of them, a score is given to each sample and that is used for classification. To improve the results, the method is not used alone, but in the two approaches, this method is combined with other existing Technique to get better results. In the first approach, which can be considered as a feature extraction, extracted features from scoring techniques (Hidden Markov Model, simple substitution distance and similarity graph) on op-code sequences, hexadecimal sequences and system calls are combined at classifier input. The second approach consists of two steps, in the first step; the scores which obtained from each of the scoring Technique are given to the three support vector machine. The outcomes are combined according to the weight of each Technique and the final decision is taken based on the majority vote. Among the components of the support vector machine, when given a higher weight in the similarity graph method (the proposed method), the result is better, Because the similarity graph method is more accurate than the other two methods. Then, in the second section, considering the strengths and benefits of each classifier, classifier outputs are combined and the majority voting is used. Three methods have been tested for group combinations, including Ensemble Averaging, Bagging, and Boosting. Ensemble Averaging consisting of the combination of four classifiers of random forests, a support vector machine (as obtained in the previous section), K nearest neighbors and naive Bayes, and the final decision is taken based on the majority vote; therefore, it is used as the proposed method. The proposed approach could detect metamorphic malware from Vxheaven set and also determines categories of malware with accuracy of 97%, while the SSD and HMM methods under the same conditions could detect malware with an accuracy of 84% and 80% respectively.
منابع مشابه
Android Malware Detection using Deep Learning on API Method Sequences
Android OS experiences a blazing popularity since the last few years. This predominant platform has established itself not only in the mobile world but also in the Internet of Things (IoT) devices. This popularity, however, comes at the expense of security, as it has become a tempting target of malicious apps. Hence, there is an increasing need for sophisticated, automatic, and portable malware...
متن کاملA New Android Malware Detection Method Using Bayesian Classification
Mobile malware has been growing in scale and complexity as smartphone usage continues to rise. Android has surpassed other mobile platforms as the most popular whilst also witnessing a dramatic increase in malware targeting the platform. A worrying trend that is emerging is the increasing sophistication of Android malware to evade detection by traditional signature-based scanners. As such, Andr...
متن کاملPolymorphic malware detection using sequence classification methods and ensembles
Identifying malicious software executables is made difficult by the constant adaptations introduced by miscreants in order to evade detection by antivirus software. Such changes are akin to mutations in biological sequences. Recently, high-throughput methods for gene sequence classification have been developed by the bioinformatics and computational biology communities. In this paper, we apply ...
متن کاملClustering Variable Length Sequences by Eigenvector Decomposition Using HMM
We present a novel clustering method using HMM parameter space and eigenvector decomposition. Unlike the existing methods, our algorithm can cluster both constant and variable length sequences without requiring normalization of data. We show that the number of clusters governs the number of eigenvectors used to span the feature similarity space. We are thus able to automatically compute the opt...
متن کاملPhysical Complexity of Variable Length Symbolic Sequences
A measure called Physical Complexity is established and calculated for a population of sequences, based on statistical physics, automata theory, and information theory. It is a measure of the quantity of information in an organism’s genome. It is based on Shannon’s entropy, measuring the information in a population evolved in its environment, by using entropy to estimate the randomness in the g...
متن کاملمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ذخیره در منابع من قبلا به منابع من ذحیره شده{@ msg_add @}
عنوان ژورنال
دوره 16 شماره 2
صفحات 137- 146
تاریخ انتشار 2019-09
با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.
کلمات کلیدی برای این مقاله ارائه نشده است
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023