Malware Classification using Convolutional Neural Networks (CNNs)

This GitHub repository contains an implementation of a malware classification system using Convolutional Neural Networks (CNNs). The goal of this project is to develop a model capable of accurately classifying different types of malware based on their input executable as an image.

First implementation malimg_classifier trained on 25 malware classes from Malimg dataset.

A second implementation combined_classifier includes in the dataset a benign class extracted from PE legitimate samples in DikeDataset.

The full explanation of the experiments can be found in presentation.pdf.

Introduction

Malware (malicious software) poses a significant threat to computer systems and networks worldwide. It is crucial to detect and classify malware accurately to prevent potential security breaches. This project focuses on leveraging the power of CNNs, a deep learning technique commonly used in computer vision tasks, to classify malware samples into different categories.

Dataset

The dataset Malimg used for this project contains labeled samples of different types of malware. Each sample is stored in a separate directory, with the directory name indicating the malware class.

A benign subset is stored in another folder which is uploaded in benign_data, while the Malimg dataset can be found here.

The dataset is organized in the following structure:

malimg_dataset/
├── class1/
│ ├── malware1.png
│ ├── malware2.png
│ ├── ...
├── class2/
│ ├── malware3.png
│ ├── malware4.png
│ ├── ...
├── ...
benign_data/
├── benign_imgs/
│ ├── sample1.png
│ ├── sample2.png
│ ├── ...

Dataset samples for each class

Benign data conversion

You can find the full code in utils/data_conversion.ipynb. Integrated from here and here.

Model Architecture

The CNN model architecture used in this project consists of several convolutional layers, followed by pooling layers and fully connected layers. The CNN workflow is the following:

Final Training

Confusion matrix on combined classifier

Evaluation metrics on combined classifier

Overall	precision	recall	f1-score	support
accuracy	0.8666	0.8666	0.8666	0.8666
macro avg	0.81705	0.88241	0.83163	2054.0
weighted avg	0.86608	0.8666	0.85959	2054.0

Evaluation metrics for each class on combined classifier

class	precision	recall	f1-score	support
Adialer.C	0.96	1.0	0.97959	24.0
Agent.FYI	0.95833	1.0	0.97872	23.0
Allaple.A	0.99313	0.98132	0.98719	589.0
Allaple.L	1.0	0.99686	0.99843	318.0
Alueron.gen!J	0.975	1.0	0.98734	39.0
Autorun.K	0.11602	1.0	0.20792	21.0
Benign	0.98658	0.75	0.85217	196.0
C2LOP.P	0.39216	0.68966	0.5	29.0
C2LOP.gen!g	0.63158	0.9	0.74227	40.0
Dialplatform.B	1.0	0.97143	0.98551	35.0
Dontovo.A	0.94118	1.0	0.9697	32.0
Fakerean	0.98611	0.93421	0.95946	76.0
Instantaccess	0.97727	1.0	0.98851	86.0
Lolyda.AA1	0.93333	1.0	0.96552	42.0
Lolyda.AA2	0.91892	0.94444	0.93151	36.0
Lolyda.AA3	0.88462	0.95833	0.92	24.0
Lolyda.AT	0.9375	0.96774	0.95238	31.0
Malex.gen!J	0.96154	0.92593	0.9434	27.0
Obfuscator.AD	1.0	1.0	1.0	28.0
Rbot!gen	0.88571	1.0	0.93939	31.0
Skintrim.N	0.94118	1.0	0.9697	16.0
Swizzor.gen!E	0.60714	0.68	0.64151	25.0
Swizzor.gen!I	0.5	0.30769	0.38095	26.0
VB.AT	0.89888	0.98765	0.94118	81.0
Wintrim.BX	0.85714	0.94737	0.9	19.0
Yuner.A	0.0	0.0	0.0	160.0

References

Gibert, D., Mateu, C., Planes, J. et al. Using convolutional neural networks for classification of malware represented as images. Using convolutional neural networks for classification of malware represented as images

Daniel Gibert, Carles Mateu, Jordi Planes, Journal of Network and Computer Applications, The rise of machine learning for detection and classification of malware: Research developments, trends and challenges. The rise of machine learning for detection and ... – ScienceDirect.

Songqing Yue, Tianyang Wang, Imbalanced Malware Images Classification: a CNN based Approach. Imbalanced Malware Images Classification: a CNN based Approach. Imbalanced Malware Images Classification: a CNN based Approach

Nataraj, Lakshmanan & Karthikeyan, Shanmugavadivel & Jacob, Grégoire & Manjunath, B.. (2011). Malware Images: Visualization and Automatic Classification. 10.1145/2016904.2016908. Malware Images: Visualization and Automatic Classification – ResearchGate.

M. Kalash, M. Rochan, N. Mohammed, N. D. B. Bruce, Y. Wang and F. Iqbal, "Malware Classification with Deep Convolutional Neural Networks," 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), Paris, France, 2018, pp. 1-5, doi: 10.1109/NTMS.2018.8328749. Malware Classification with Deep Convolutional Neural Networks | IEEE ...

Tuan, Anh Pham; Phuong, An Tran Hung; Thanh, Nguyen Vu; Van, Toan Nguyen (2018). Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset. figshare. Dataset. Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset https://figshare.com/articles/dataset/Malware_Detection_PE-Based_Analysis_Using_Deep_Learning_Algorithm_Dataset/6635642/1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malware Classification using Convolutional Neural Networks (CNNs)

Introduction