Sparse representation-based over-sampling technique for classification of imbalanced dataset

As one of the most popular research fields in machine learning, the research on imbalanced dataset receives more and more attentions in recent years. The imbalanced problem usually occurs in when minority classes have extremely fewer samples than the others. Traditional classification algorithms have not taken the distribution of dataset into consideration, thus they fail to deal with the problem of class-imbalanced learning, and the performance of classification tends to be dominated by the majority class. SMOTE is one of the most effective over-sampling methods processing this problem, which changes the distribution of training sets by increasing the size of minority class. However, SMOTE would easily result in over-fitting on account of too many repetitive data samples. According to this issue, this paper proposes an improved method based on sparse representation theory and oversampling technique, named SROT (Sparse Representation-based Over-sampling Technique). The SROT uses a sparse dictionary to create synthetic samples directly for solving the imbalanced problem. The experiments are performed on 10 UCI datasets using C4.5 as the learning algorithm. The experimental results show that compared our algorithm with Random Over-sampling techniques, SMOTE and other methods, SROT can achieve better performance on AUC value

PDF
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here