Unsupervised machine learning models

Edited on 2024-02-09 In technical , machine learning notes Views:

Hierarchical Clustering

Split

Consider all the samples as one cluster, then split into sub-clusters depending on distances.

Combine

Consider each sample as one cluster, then combine the nearest into one cluster.

K-means

NP-hard problem

Steps:

Select k random nodes as the center of clusters
For the rest of n-k samples, calculate the distance between the sample with the centers
Classify the sample to the nearest cluster
Re-calculate the center of the clusters
Repeat 3-4 until the centers don’t change anymore

How to decide the first centers
The randomly selected nodes as first centers would influence the results of the clusters.

Solutions:

Select the initial centers for multiple times
Use hierarchical clustering to find k clusters, and then use them as the initial centers.

How to choose k
If not decided by business rules, then use elbow rule.
Draw a linear picture between values of k and diameter of clusters:


Elbow Rule

The x is the values of k, and the y is the diameter of the clusters, eg, the sum of the square distance between points in a cluster and the cluster centroid.

From the picture above, the best value of k is 3.

Other Improments for K-Means

Sensitive to Initial Values
k-means++:

Sensitive to Noise and Outliners

Kelly H

Unsupervised machine learning models

Hierarchical Clustering

Split

Combine

K-means

Other Improments for K-Means

Distance

Jaccard Distance

Cosine Distance