An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data

22 Jan 2018  ·  Boonchoo Thapana, Ao Xiang, He Qing ·

DBSCAN is a typically used clustering algorithm due to its clustering ability for arbitrarily-shaped clusters and its robustness to outliers. Generally, the complexity of DBSCAN is O(n^2) in the worst case, and it practically becomes more severe in higher dimension. Grid-based DBSCAN is one of the recent improved algorithms aiming at facilitating efficiency. However, the performance of grid-based DBSCAN still suffers from two problems: neighbour explosion and redundancies in merging, which make the algorithms infeasible in high-dimensional space. In this paper, we propose a novel algorithm named GDPAM attempting to extend Grid-based DBSCAN to higher data dimension. In GDPAM, a bitmap indexing is utilized to manage non-empty grids so that the neighbour grid queries can be performed efficiently. Furthermore, we adopt an efficient union-find algorithm to maintain the clustering information in order to reduce redundancies in the merging. The experimental results on both real-world and synthetic datasets demonstrate that the proposed algorithm outperforms the state-of-the-art exact/approximate DBSCAN and suggests a good scalability.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here