How Gradient Descent Separates Data with Neural Collapse: A Layer-Peeled Perspective

In this paper, we derive a landscape analysis to the surrogate model to study the inductive bias of the neural features and parameters from neural networks with cross-entropy. We show that once the training cross-entropy loss decreases below a certain threshold, the features and classifiers in the last layer of the neural network will converge to a certain geometry structure, which is known as neural collapse\citep{papyan2020prevalence,fang2021layer}, \emph{i.e.} cross-example within-class variability of last-layer feature collapses to zero and the class-means converge to a Simplex Equiangular Tight Frame (ETF). We illustrate that the cross-entropy loss enjoys a benign global landscape where all the critical points are strict saddles whose Hessian exhibit negative curvature directions except the only global minimizers which exhibit neural collapse phenomenon.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here