Spectral Clustering

 How does Spectral Clustering work?

In spectral clustering, the data points are treated as nodes of a graph. Thus, clustering is treated as a graph partitioning problem. The nodes are then mapped to a low-dimensional space that can be easily segregated to form clusters. An important point to note is that no assumption is made about the shape/form of the clusters.


What are the steps for Spectral Clustering?

Spectral clustering involves 3 steps:
1. Compute a similarity graph
2. Project the data onto a low-dimensional space
3. Create clusters


1. Compute a Similarity graph

Fully connected graph: To construct this graph, we simply connect all points with each other, and we weight all edges by similarity sij. This graph should model the local neighborhood relationships, thus similarity functions such as Gaussian similarity function are used.

Here the parameter σ controls the width of the neighborhoods.

The choice of sigma is critical in spectral clustering because it determines the scale at which the data points are clustered. If sigma is too small, the affinity matrix will be too sparse and the clusters may not capture the underlying structure of the data (emphasize local similarities). If sigma is too large, the affinity matrix will be too dense and the clusters may be too broad and overlapping (emphasize global similarities).

Thus, when we create an adjacency matrix for any of these graphs, Aij ~ 1 when the points are close and Aij → 0 if the points are far apart.

Step 2 — Project the data onto a low-dimensional space:

Our goal then is to transform the space so that when the 2 points are close, they are always in the same cluster, and when they are far apart, they are in different clusters. We need to project our observations into a low-dimensional space. For this, we compute the Normalized Graph Laplacian, which is just another matrix representation of a graph and can be useful in finding interesting properties of a graph. This can be computed as:


The eigenvectors corresponding to the largest eigenvalues provide a low-dimensional representation of the data points that captures the underlying structure of the data.

By selecting the k largest eigenvectors, we keep the most informative components of the data and discard the noise. This approach often leads to better clustering results than other clustering algorithms that rely on distance measures in the original feature space.


Comments

Popular posts from this blog