Similarity-based Distance for Categorical Clustering using Space Structure

19 Nov 2020  ·  Utkarsh Nath, Shikha Asrani, Rahul Katarya ·

Clustering is spotting pattern in a group of objects and resultantly grouping the similar objects together. Objects have attributes which are not always numerical, sometimes attributes have domain or categories to which they could belong to. Such data is called categorical data. To group categorical data many clustering algorithms are used, among which k- modes algorithm has so far given the most significant results. Nevertheless, there is still a lot which could be improved. Algorithms like k-means, fuzzy-c-means or hierarchical have given far better accuracies with numerical data. In this paper, we have proposed a novel distance metric, similarity-based distance (SBD) to find the distance between objects of categorical data. Experiments have shown that our proposed distance (SBD), when used with the SBC (space structure based clustering) type algorithm significantly outperforms the existing algorithms like k-modes or other SBC type algorithms when used on categorical datasets.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here