DBSCAN algorithm

DBSCAN stands for density-based spatial clustering of applications with noise. It is able to find arbitrary-shaped clusters and clusters with noise (i.e. outliers).
The main idea behind DBSCAN is that a point belongs to a cluster if it is close to many points from that cluster.
There are two key parameters of DBSCAN:

  • eps: The distance that specifies the neighborhoods. Two points are considered to be neighbours if the distance between them are less than or equal to eps.

  • minPts: Minimum number of data points to define a cluster.

Based on these two parameters, points are classified as core point, border point, or outlier:

  • Core point: A point is a core point if there are at least minPts number of points (including the point itself) in its surrounding area with radius eps.

  • Border point: A point is a border point if it is reachable from a core point and there are less than minPts number of points within its surrounding area.

  • Outlier: A point is an outlier if it is not a core point and not reachable from any core points.

These points may be better explained with visualizations. 


Comments

Popular posts from this blog