Matching Global Data References in Related Executables
نویسندگان
چکیده
Research and development efforts have recently compared malware variants. A number of these projects have focused on identifying functions through the use of signature-based classifiers. We introduce three new classifiers that characterize a function’s use of global data. Experiments on malware show that we can meaningfully correlate functions on the basis of their global data references even when their functions share little code. We also present an algorithm that combines existing classifiers and our new ones into an ensemble for correlating functions in two binary programs. The resulting combined ensemble classifier dominates the previously reported classifiers.
منابع مشابه
Detecting a malicious executable without prior knowledge of its patterns
To detect malicious executables, often spread as email attachments, two types of algorithms are usually applied under instance-based statistical learning paradigms: 1) Signature-based template matching, which finds unique tell-tale characteristics of a malicious executable and thus is capable of matching those with known signatures; 2) Two-class supervised learning, which determines a set of fe...
متن کاملInformation Fusion for Entity Matching in Unstructured Data
Every day the global media system produces an abundance of news stories, all containing many references to people. An important task is to automatically generate reliable lists of people by analysing news content. We describe a system that leverages large amounts of data for this purpose. Lack of structure in this data gives rise to a large number of ways to refer to any particular person. Enti...
متن کاملComplete forcing numbers of polyphenyl systems
The idea of “forcing” has long been used in many research fields, such as colorings, orientations, geodetics and dominating sets in graph theory, as well as Latin squares, block designs and Steiner systems in combinatorics (see [1] and the references therein). Recently, the forcing on perfect matchings has been attracting more researchers attention. A forcing set of M is a subset of M contained...
متن کاملAdaptive Approximate Record Matching
Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...
متن کاملReducing Graph Matching to Tree Matching for XML Queries with ID References
ID/IDREF is an important and widely used feature in XML documents for eliminating data redundancy. Most existing algorithms consider an XML document with ID references as a graph and perform graph matching for queries involving ID references. Graph matching naturally brings higher complexity compared with original tree matching algorithms that process XML queries. In this paper, we make use of ...
متن کامل