Skip to content
Taylor G Smith edited this page Apr 23, 2016 · 2 revisions

KMeans

k-means clustering is a partitional clustering algorithm that seeks to assign each observation to its nearest center point, or centroid. After each iteration the centroids are re-computed as the mean of cluster, and the algorithm continues until convergence—when the sum of the "within sum of squares" drops below some predefined threshold.

The clust4j implementation of k-means handles several corner-cases in different manners:

  • If k is less than 1, an IllegalArgumentException will be thrown
  • If k is greater than m, the number of observations in the input matrix, an IllegalArgumentException will be thrown
  • If all values in the input matrix are exactly identical (i.e., a common input mistake is feeding a just-initialized matrix of all zeroes), KMeans will behave as though k were exactly 1, and each observation will be classified into the same cluster
  • If the GeometricallySeparable metric you are using produces equal distances/similarities (can happen with some obscure Kernel classes) between each observation, the algorithm will act as though k were equal to one, and classify each observation into the same class—regardless of what your initial k value was!
Clone this wiki locally