Search Results for author: Minhak Song

Found 2 papers, 1 papers with code

Does SGD really happen in tiny subspaces?

no code implementations25 May 2024 Minhak Song, Kwangjun Ahn, Chulhee Yun

This suggests that the observed alignment between the gradient and the dominant subspace is spurious.

Linear attention is (maybe) all you need (to understand transformer optimization)

1 code implementation2 Oct 2023 Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra

Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics.

Cannot find the paper you are looking for? You can Submit a new open access paper.