CAST: Cluster-Aware Self-Training for Tabular Data

10 Oct 2023 · Minwook Kim, Juseong Kim, Ki Beom Kim, Giltae Song ·

Self-training has gained attraction because of its simplicity and versatility, yet it is vulnerable to noisy pseudo-labels caused by erroneous confidence. Several solutions have been proposed to handle the problem, but they require significant modifications in self-training algorithms or model architecture, and most have limited applicability in tabular domains. To address this issue, we explore a novel direction of reliable confidence in self-training contexts and conclude that the confidence, which represents the value of the pseudo-label, should be aware of the cluster assumption. In this regard, we propose Cluster-Aware Self-Training (CAST) for tabular data, which enhances existing self-training algorithms at a negligible cost without significant modifications. Concretely, CAST regularizes the confidence of the classifier by leveraging local density for each class in the labeled training data, forcing the pseudo-labels in low-density regions to have lower confidence. Extensive empirical evaluations on up to 21 real-world datasets confirm not only the superior performance of CAST but also its robustness in various setups in self-training contexts.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Pseudo Label

Datasets

OpenML-CC18

Results from the Paper

Edit

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

AWARE

Edit Social Preview

CAST: Cluster-Aware Self-Training for Tabular Data

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove