Clustering is an important unsupervised knowledge acquisition method,
which divides the unlabeled data into different groups
[atilgan2021efficient, d2021automatic]. Different clustering algorithms make different
assumptions on the cluster formation, thus, most clustering algorithms
are able to well handle at least one particular type of data
distribution but may not well handle the other types of distributions.
For example, K-means identifies convex clusters well [bai2017fast],
and DBSCAN is able to find clusters with similar densities
[DBSCAN]. Therefore, most clustering methods may not work well on data
distribution patterns that are different from the assumptions being made
and on a mixture of different distribution patterns. Taking DBSCAN as an
example, it is sensitive to the loosely connected points between dense
natural clusters as illustrated in
Figure~LABEL:figconnect. The density
of the connected points shown in
Figure~LABEL:figconnect is different
from the natural clusters on both ends, however, DBSCAN with fixed
global parameter values may wrongly assign these connected points and
consider all the data points in
Figure~LABEL:figconnect as one big
cluster.