TBD
TBD
sudo apt-get install python-dev python-tk
We provide detailed instructions only for Linux users, but Windows users can also easilly install all of that dependencies.
FFmpeg is used to decode audio files and to convert samples to .wav format. To install FFmpeg type in command line the following:
sudo add-apt-repository ppa:mc3man/trusty-media
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install ffmpeg
Full information about installing FFmpeg can be found on it's ppa page.
Pip is a package management system used to install and manage software packages written in Python. You can install it as shown below:
sudo apt-get -y install python-pip
The pydub library is a usefull module for working with audio files. We use it to get 30 seconds sample from each song that could be considered as training or testing data. To install this use terminal:
pip install pydub
Python speech features is a library that provides common speech features for ASR including MFCCs and filterbank energies. We use this library to calculate Mel Frequency Cepstral Coefficients for each song. To install it you should download zip from git page and unpack it or or clone git repository (if you have git intalled):
git clone https://github.com/jameslyons/python_speech_features.git
Then you should setup environment to use the library in your project:
cd ./python_speech_features
sudo python setup.py install
Scikit-learn TBD
pip install -U scikit-learn
TBD
k-NN is a non-parametric method used for classification. Input consists of the l closest training examples in the feature space. An object is classified by a majority vote of its neghbots, with the object begin assigned to the class most common among its k nearest neighbors. It can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. More infomation about this algorithm can be found on Wikipedia.
To figure out the distance between two songs we use Kullback-Leibler divergence. So we have two multivatiate Gaussian distribution with mean and covariance derived from the MFCC matrix for each song. To compute the distance we use the following formula:
where are means, are covariance matricies.
is a trace of square matrix.
More information about Kullback-Leibler divergence can be found on Wikipedia.