Skip to content

amyxzhang/OPTICS-Automatic-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Automatic Clustering of Hierarchical Clustering Representations
 
Library Dependencies: numpy, if graphing is desired - matplotlib
OPTICS implementation used has dependencies include numpy, scipy, and hcluster

An implementation of the following algorithm, with some minor add-ons:
J. Sander, X. Qin, Z. Lu, N. Niu, A. Kovarsky, K. Whang, J. Jeon, K. Shim, J. Srivastava, Automatic extraction of clusters from hierarchical clustering representations. Advances in Knowledge Discovery and Data Mining (2003) Springer Berlin / Heidelberg. 567-567
available from http://dx.doi.org/10.1007/3-540-36175-8_8

Implemented in Python by Amy X. Zhang, while at Cambridge Computer Laboratory in March 2012. Now at MIT.
[email protected]
http://people.csail.mit.edu/axz

This algorithm assumes you have already obtained a hierarchical clustering representation  of your data, through OPTICS or other means. The clustering algorithm I use is OPTICS and an implementation in python can be found here: http://chemometria.us.edu.pl/download/optics.py. See the demo on how to use the OPTICS code with this.

OPTICS is a hierarchical clustering algorithm for finding density-based clusters in spatial data. It returns a list of points linearly ordered so that spatially close points are neighbors as well as an associated reachability value for every point. This makes up a reachability plot, a special kind of dendrogram. For this algorithm, the input is a reachability plot. If necessary, dendrograms can be converted to reachability plots and vice versa, as outlined in the above paper.
 
The algorithm takes in a reachability plot and list of points ordered by OPTICS. It returns the root node of the hierarchical clustering tree created by finding the local maximas in the reachability plot and choosing whether to use or discard them. Please see the paper for a more detailed description. There are a few changes in the algorithm given in the paper, and they are noted in the code. For functionality, I have added helper functions to print out, graph, write to file, or extract only leaves from the tree. See demo for examples on using them.

This code was used for the following publication (Bibtex format):

@inproceedings{zhang2013hoodsquare,
  title={Hoodsquare: Modeling and recommending neighborhoods in location-based social networks},
  author={Zhang, Amy X and Noulas, Anastasios and Scellato, Salvatore and Mascolo, Cecilia},
  booktitle={Social Computing (SocialCom), 2013 International Conference on},
  pages={69--74},
  year={2013},
  organization={IEEE}
}

About

automatic hierarchical clustering of OPTICS reachability plots implemented in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages