no code implementations • 11 Dec 2023 • Dashiell Stander, Qinan Yu, Honglu Fan, Stella Biderman
We use the group Fourier transform over the symmetric group $S_n$ to reverse engineer a 1-layer feedforward network that has "grokked" the multiplication of $S_5$ and $S_6$.
no code implementations • 24 Oct 2023 • Qinan Yu, Jack Merullo, Ellie Pavlick
By scaling up or down the value vector of these heads, we can control the likelihood of using the in-context answer on new data.
1 code implementation • 17 Jan 2023 • Albert Webson, Alyssa Marie Loo, Qinan Yu, Ellie Pavlick
However, recent work finds that models can perform surprisingly well when given intentionally irrelevant or misleading prompts.
1 code implementation • 20 Dec 2022 • Martha Lewis, Nihal V. Nayak, Peilin Yu, Qinan Yu, Jack Merullo, Stephen H. Bach, Ellie Pavlick
In this work, we focus on the ability of a large pretrained vision and language model (CLIP) to encode compositional concepts and to bind variables in a structure-sensitive way (e. g., differentiating ''cube behind sphere'' from ''sphere behind cube'').