Search Results for author: Minhak Song

Found 2 papers, 1 papers with code

Does SGD really happen in tiny subspaces?

no code implementations • 25 May 2024 • Minhak Song, Kwangjun Ahn, Chulhee Yun

This suggests that the observed alignment between the gradient and the dominant subspace is spurious.

Paper
Add Code

Linear attention is (maybe) all you need (to understand transformer optimization)

1 code implementation • 2 Oct 2023 • Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra

Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics.

7

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.