一种基于机器学习的分布式恶意代码检测方法

董立勉; 左晓军; 曲武; 王莉军

文章摘要

董立勉,左晓军,曲武,王莉军.一种基于机器学习的分布式恶意代码检测方法[J].情报工程,2015,1(6):090-101

一种基于机器学习的分布式恶意代码检测方法

A Distributed Approach to Fast Malware Classfication Based on Machine Learning

DOI：10.3772/j.issn.2095-915X.2015.06.013

中文关键词: 恶意代码检测，随机森林算法，元数据，Spark

英文关键词: Malware classification, random forest algorithm, meta data, spark

基金项目:

作者	单位
董立勉	国网河北省电力公司电力科学研究院
左晓军	国网河北省电力公司电力科学研究院
曲武	北京启明星辰信息安全技术有限公司核心研究院
王莉军	中国科学技术信息研究所

摘要点击次数: 3114

全文下载次数: 3726

中文摘要:

随着恶意代码规模的快速增长，传统的恶意代码检测方法已经逐渐失效。主流的启发式技术由于其动态执行的特点，在许多应用中难以推广。因此，通过恶意代码的静态特征，将可疑恶意代码 PE 文件分类到相应的恶意代码家族是相当重要的。在本文，基于对恶意代码 PE 文件的分析结果，提出元数据的概念，并于此实现了恶意代码快速检测原型，PE-Classifier。在 spark 分布式环境中，通过使用随机森林分类算法，基于恶意代码元数据，能够对恶意代码进行快速和精准地分类和检测。实验结果表明，通过对大量的恶意代码 PE 样本元数据分析，本文提出的原型系统 PE-Classifier 能够根据元数据相似性判断样本的语义相似性，从而辅助检测，使得反病毒软件更为有效。

英文摘要:

With the rapid increase of malware, conventional malware detection approaches increasingly fail,modern heuristic technologies often perform dynamically, which is not possible in many applications due to related effort and the scale of files. Therefore, it is important for malware analysis that it is classifying unknown malware files into malware families in order to characterize the static malware characteristic accuracy. In this paper, we introduce a distributed approach to perform fast malware classification based on gene metadata of malware PE executable. We use a machine learning technique called random forest algorithm to classify malware fast and accurately. The results of large scale classifications show that our prototype system successfully determined some semantic similarity between malware according to the gene metadata, and make the antivirus more effective.

查看全文查看/发表评论下载PDF阅读器

关闭