Skip to content

Latest commit

 

History

History
97 lines (79 loc) · 4.02 KB

README.md

File metadata and controls

97 lines (79 loc) · 4.02 KB

Music genre classification

Description

TBD

Installing python

TBD

sudo apt-get install python-dev python-tk

Installing the dependencies

We provide detailed instructions only for Linux users, but Windows users can also easilly install all of that dependencies.

FFmpeg

FFmpeg is used to decode audio files and to convert samples to .wav format. To install FFmpeg type in command line the following:

sudo add-apt-repository ppa:mc3man/trusty-media
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install ffmpeg

Full information about installing FFmpeg can be found on it's ppa page.

Pip

Pip is a package management system used to install and manage software packages written in Python. You can install it as shown below:

sudo apt-get -y install python-pip

Pydub library

The pydub library is a usefull module for working with audio files. We use it to get 30 seconds sample from each song that could be considered as training or testing data. To install this use terminal:

pip install pydub

Python speech features

Python speech features is a library that provides common speech features for ASR including MFCCs and filterbank energies. We use this library to calculate Mel Frequency Cepstral Coefficients for each song. To install it you should download zip from git page and unpack it or or clone git repository (if you have git intalled):

git clone https://github.com/jameslyons/python_speech_features.git

Then you should setup environment to use the library in your project:

cd ./python_speech_features
sudo python setup.py install

Scikit-learn

Scikit-learn TBD

pip install -U scikit-learn

Mel Frequency Cepstral Coefficients (MFCC)

TBD

k-Nearest Neighbors algorithm (k-NN)

k-NN is a non-parametric method used for classification. Input consists of the l closest training examples in the feature space. An object is classified by a majority vote of its neghbots, with the object begin assigned to the class most common among its k nearest neighbors. It can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. More infomation about this algorithm can be found on Wikipedia.

To figure out the distance between two songs we use Kullback-Leibler divergence. So we have two multivatiate Gaussian distribution with mean and covariance derived from the MFCC matrix for each song. To compute the distance we use the following formula:

KL-divergence-for-multivariate-Normal-distribution

where mu are means, covariance-matricies are covariance matricies.

trace-formula is a trace of square matrix.

More information about Kullback-Leibler divergence can be found on Wikipedia.