- If distance between datapoints heavily depends on a few co-oridinates then 'smooth' the data using Fast Walsh-Hadamard transform before using the software (for l2 distance).
- Dependendices : g++-4.9, gcc-4.9, cmake, libboost-all-dev, build-essential, libhdf5-serial-dev.
- Run
./utils/build.sh
to build all the binaries. aws_server.ini
stores all the information (such as datapath, saveResultsPath) required by the software.
- All the figures can be generated in ipython notebooks in 'figure' folder.
- The figures are generated from experiments - stored in 'experiments' folder.
- The stored experiments for dense datasets can be reproduced using the following lines of code
- K-nearest neigbhours :
./build/knn aws_server.ini start-index end-index
(finds the k nearest points for points from start to end index) - K-means :
./build/kmeans aws_server.ini
- Heirarchical:
./build/heirarchical aws_server.ini random-seed
- Mutual Information Feature Selection:
./build/gasmmi aws_server.ini number-features sample-size random-seed
- K-nearest neigbhours :
- The stored experiments for sparse datasets can be reproduced using the following lines of code
- K-nearest neigbhours :
./build/knn10x aws_server.ini start-index end-index
(finds the k nearest points for points from start to end index) - K-means :
./build/kmeans10x aws_server.ini
- K-nearest neigbhours :