DBSCAN algorithm
DBSCAN stands for density-based spatial clustering of applications with noise. It is able to find arbitrary-shaped clusters and clusters with noise (i.e. outliers).
The main idea behind DBSCAN is that a point belongs to a cluster if it is close to many points from that cluster.
There are two key parameters of DBSCAN:
eps: The distance that specifies the neighborhoods. Two points are considered to be neighbours if the distance between them are less than or equal to eps.
minPts: Minimum number of data points to define a cluster.
Based on these two parameters, points are classified as core point, border point, or outlier:
Core point: A point is a core point if there are at least minPts number of points (including the point itself) in its surrounding area with radius eps.
Border point: A point is a border point if it is reachable from a core point and there are less than minPts number of points within its surrounding area.
Outlier: A point is an outlier if it is not a core point and not reachable from any core points.
These points may be better explained with visualizations.
Comments
Post a Comment