"Hacking in the sense of deconstructing an idea, hardware, anything and getting it to do something it wasn’t intended or to better understand how something works." (BSides CFP)
So hacking here means we want to quickly deconstruct data, understand what we've got and how to best utilize it for the problem at hand.
The primary motivation for these exercises is to explore the nexus of IPython, Pandas and Scikit Learn on security data of various kinds. The exercises will often intentionally show common missteps, warts in the data, paths that didn't work out that well and results that could definitely be improved upon. In general we're trying to capture what worked and what didn't, not only is that more realistic but often much more informative to the reader. :)
(here's a quick way to get up and running)
\1. Obtain your miniconda version at https://repo.continuum.io/miniconda/
Miniconda2 = python2 | Miniconda3 = python3
Python3 Miniconda for 64-bit Windows 10:
Miniconda3-4.5.4-Windows-x86_64.exe
\2. You can verify the checksum with Git Bash:
md5sum /path/to/file/anaconda-file-name-here.exe
\3. After installation, open Anaconda Prompt and run the following:
conda create --name dga
conda activate dga
conda install pandas
conda install scikit-learn
conda install -c conda-forge tldextract
conda install jupyter
\4. After the libraries are downloaded, run the following command to start the Jupyter Notebook interface:
jupyter notebook
- IPython: Architecture for interactive computing and presentation
- Pandas: Python Data Analysis Library
- Scikit Learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
- Matplotlib: Python 2D plotting library
-
Detecting Algorithmically Generated Domains (BSidesDFW 2013)
-
Hierarchical Clustering of Syslogs (BSidesDFW 2013)
-
Exploration of data from Malware Domain List (BSidesDFW 2013)
-
SQL Injection (Shmoocon 2014)
-
Browser Agent Fingerprinting (Shmoocon 2014)
-
PE File Classification (BSides 2014)
-
PCAP Exploration (BSidesATX 2014)
-
Drive-By PCAP Analysis (ISSW 2014)
-
Mach-O Classification (SANS DFIR 2014)
-
Yara Clustering (BSides Las Vegas 2014)
-
SWF Classification (ShmooCon 2015)
-
Java Class File Classification (ShmooCon 2015)
-
Windows Executable Clustering by Image Similarity
-
PE File Similarity Graph using Workbench
#####Setup:
-
Required packages:
- Brew/apt-get
- graphviz, freetype, zmq
- Python
- ipython, pygraphviz, pandas, matplotlib, networkx, pyzmq, jinja2, scipy, patsy, statsmodels, pefile, macholib
- Brew/apt-get
-
Some of the exercises use packages from the data_hacking repository, to install those packages into your python site packages:
%> sudo python setup.py install
- To uninstall:
%> sudo pip uninstall data_hacking
There's quite a bit of google results for this, we actually have mixed feelings about the IPython install instructions on the IPython page. The directions work but it directs you to download and install Anaconda or the free edition of Enthought Canopy. Both of these are prepackaged python distributions with a bunch of stuff like Numpy, Scipy, IPython, Matplotlib, Pandas, ... occasionally these will have a hitch and then you might be a bit SOL because StackOverflow is going to say 'WTF are those things? Just do '$pip install blah' or '$brew install blah'.
So we recommend you be brave and do it the normal way... in particular this guy seems to have a pretty good write up for Mac installs:
Most of the notebooks will have relative paths to some resources, data files or images. In general the easiest way we found to run ipython on the notebooks is to change into that project directory and run ipython with this alias (put in your .bashrc or whatever):
alias ipython='ipython notebook --FileNotebookManager.notebook_dir=`pwd`'
$ cd data_hacking/fun_with_syslog $ ipython (as aliased above)