GitHub - vinbhaskara/MalwareGAN: Visualizing malware behavior, and proactive protection using GANs against zero-day attacks.

Emulating malware authors for proactive protection using GANs over a distributed image visualization of dynamic file behavior

Cite as: V.S. Bhaskara, and D. Bhattacharyya. arXiv preprint arXiv:1807.07525 [stat.ML] (2018).

References to the code

The WGAN-GP model trained is based on the code published at https://github.com/igul222/improved_wgan_training.
We used the improved_wgan_training/gan_64x64.py script with the network architectures defined by GoodGenerator and GoodDiscriminator functions.
The 64-bit dHash used per channel is based on the implementation at https://github.com/JohannesBuchner/imagehash. An extension of the hash by concatenating dHashes across the channels for color images is presented in the color_dHash192.py script.

Dataset

dataset_filedetails.csv: Lists the file SHA256 hashes and the file names of the 12,006 distinct executables used.

API Calls Hooked

HookedApiCallList.txt: Lists all the 1,984 individual API calls that were hooked for determining the call invocation sequences of executables.

Figures (full resolution)

Figure 3a: figure3a_samples_clean_preview.png: Samples of 64x64 image representations corresponding to 64 distinct Clean files, arranged in a grid of 8x8, chosen randomly from the dataset.

Figure 3b: figure3b_samples_malware_preview.png: Samples of 64x64 image representations corresponding to 64 distinct Malicious files, arranged in a grid of 8x8, chosen randomly from the dataset.

Figure 7a: figure7a_samples_malware_gan_train.png: Samples of 64x64 image representations corresponding to 32 distinct Malicious files randomly chosen from the images used for Training the WGAN-GP model.

Figure 7b: figure7b_samples_malware_gan_valid.png: Samples of 64x64 image representations corresponding to 32 distinct Malicious files randomly chosen from the images used for Validating the WGAN-GP model.

Figure 10b: figure10b_wgan_generated_samples.png: Samples of 64x64 image representations corresponding to 64 synthetic images generated by the Generator after training the WGAN-GP model for 45,000 generator iterations.

Software Categorization

software_categorization_details/: Contains the 64x64 PNGs of the scaled images used in Table 3 of the paper for demonstrating software categorization using images.

software_categorization_details/table3_filedetails.csv: Lists the details of the files used in Table 3 of the paper, including, the file names, SHA256 digests, and their corresponding image hashes (SHA256 and 192-bit color dHash).

software_categorization_details/figure5_filedetails_categories_dhash_cutoff.csv: Lists the details of the 254 files belonging to 21 file categories used for determining an optimal dHash cutoff demonstrated in Figure 5 of the paper.

Vector Arithmetic and Image Decodings

vector_arithmetic_and_decodings/: Contains the PNGs used to demonstrate the decoding of the images to the API information, and the vector arithmetic in the noise vs pixel space. The image decodings of the corresponding images are contained in the vector_arithmetic_and_decodings/image_decodings/ folder.

Training Information

Training the GAN

The WGAN-GP model was trained on 4 nVIDIA GTX TITAN X GPUs for about a day (~1.7 seconds per generator iteration) using tensorflow 1.5.0 on a Ubuntu 14.04 system with nVIDIA driver version 389.80, CuDNN 7, and CUDA 9.0.

Training the XGBoost Model

The XGBoost model of Section 4 of the paper was trained on the XGBoost 0.6 release with the following booster hyperparameters:

{'eval_metric': 'mlogloss', 'num_estimators': 200, 'alpha': 0, 'num_class': 2, 'booster': 'gbtree', 'colsample_bytree': 0.7, 'min_child_weight': 1e-06, 'subsample': 0.5, 'eta': 0.1, 'objective': 'multi:softprob', 'max_depth': 10, 'gamma': 0, 'lambda': 0}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Emulating malware authors for proactive protection using GANs over a distributed image visualization of dynamic file behavior

References to the code

Dataset

API Calls Hooked

Figures (full resolution)

Software Categorization

Vector Arithmetic and Image Decodings

Training Information

Training the GAN

Training the XGBoost Model

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
software_categorization_details		software_categorization_details
vector_arithmetic_and_decodings		vector_arithmetic_and_decodings
HookedApiCallList.txt		HookedApiCallList.txt
README.md		README.md
color_dHash192.py		color_dHash192.py
dataset_filedetails.csv		dataset_filedetails.csv
figure10b_wgan_generated_samples.png		figure10b_wgan_generated_samples.png
figure3a_samples_clean_preview.png		figure3a_samples_clean_preview.png
figure3b_samples_malware_preview.png		figure3b_samples_malware_preview.png
figure7a_samples_malware_gan_train.png		figure7a_samples_malware_gan_train.png
figure7b_samples_malware_gan_valid.png		figure7b_samples_malware_gan_valid.png

vinbhaskara/MalwareGAN

Folders and files

Latest commit

History

Repository files navigation

Emulating malware authors for proactive protection using GANs over a distributed image visualization of dynamic file behavior

References to the code

Dataset

API Calls Hooked

Figures (full resolution)

Software Categorization

Vector Arithmetic and Image Decodings

Training Information

Training the GAN

Training the XGBoost Model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages