-
Notifications
You must be signed in to change notification settings - Fork 65
KMeans
Taylor G Smith edited this page Apr 23, 2016
·
2 revisions
k-means clustering is a partitional clustering algorithm that seeks to assign each observation to its nearest center point, or centroid. After each iteration the centroids are re-computed as the mean of cluster, and the algorithm continues until convergence—when the sum of the "within sum of squares" drops below some predefined threshold.
The clust4j implementation of k-means handles several corner-cases in different manners:
- If k is less than 1, an
IllegalArgumentException
will be thrown - If k is greater than m, the number of observations in the input matrix, an
IllegalArgumentException
will be thrown - If all values in the input matrix are exactly identical (i.e., a common input mistake is feeding a just-initialized matrix of all zeroes),
KMeans
will behave as though k were exactly 1, and each observation will be classified into the same cluster - If the
GeometricallySeparable
metric you are using produces equal distances/similarities (can happen with some obscureKernel
classes) between each observation, the algorithm will act as though k were equal to one, and classify each observation into the same class—regardless of what your initial k value was!