Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typos in README #99

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions 5_Clustering/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@
### Introduction
K-Means is an unsupervised machine learning algorithm. The algorithm divides the data points into k groups (called clusters), where each data point can belong to only one cluster. K-Means aims to group together similar data points into the same cluster, while keeping different clusters as far apart as possible.

Each cluster has a center, which is a data point that represents the center of the cluster. A data point gets added to a cluster whose center is closest to that data point. Distance between points is measures using sum of squared distances method.
Each cluster has a center, which is a data point that represents the center of the cluster. A data point gets added to a cluster whose center is closest to that data point. Distance between points is measured using sum of squared distances method.

### Algorithm

1. Select the number of clusters, k
2. Appoint k data points as cluster centers (either random assignment, or space them as far apart as possible)
3. Until cluster assignments do not change, do the following for each data point:
3. Until cluster assignments do not change, do the following for each data point:
1. Calculate the sum of squared distance between it and all the cluster centers.
2. Assign the point to the cluster having the closest center.
3. Recalculate the center for clusters by taking the average of all data points assigned to that cluster.
Expand All @@ -31,4 +31,4 @@ Each cluster has a center, which is a data point that represents the center of t

## Hierarchical Clustering - Agglomerative Clustering

Initially, each data point is treated as an independent cluster. At each step, the two closest clusters are merged to become one cluster. This process continues until only a single cluster remains. Once the process is complete, we can cut the tree into clusters as needed.
Initially, each data point is treated as an independent cluster. At each step, the two closest clusters are merged to become one cluster. This process continues until only a single cluster remains. Once the process is complete, we can cut the tree into clusters as needed.