KMeans

k-means clustering is a partitional clustering algorithm that seeks to assign each observation to its nearest center point, or centroid. After each iteration the centroids are re-computed as the mean of cluster, and the algorithm continues until convergence—when the sum of the "within sum of squares" drops below some predefined threshold.

The clust4j implementation of k-means handles several corner-cases in different manners:

If k is less than 1, an IllegalArgumentException will be thrown
If k is greater than m, the number of observations in the input matrix, an IllegalArgumentException will be thrown
If all values in the input matrix are exactly identical (i.e., a common input mistake is feeding a just-initialized matrix of all zeroes), KMeans will behave as though k were exactly 1, and each observation will be classified into the same cluster
If the GeometricallySeparable metric you are using produces equal distances/similarities (can happen with some obscure Kernel classes) between each observation, the algorithm will act as though k were equal to one, and classify each observation into the same class—regardless of what your initial k value was!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KMeans

KMeans

Clone this wiki locally