no code implementations • 13 Oct 2023 • Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio
We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i. e., the depth of the computation graph).
no code implementations • 1 Mar 2023 • Peiye Zhuang, Samira Abnar, Jiatao Gu, Alex Schwing, Joshua M. Susskind, Miguel Ángel Bautista
Diffusion probabilistic models have quickly become a major approach for generative modeling of images, 3D geometry, video and other domains.
1 code implementation • 27 Jul 2022 • Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind
We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.
Ranked #1 on Image Generation on ARKitScenes
no code implementations • 21 Jul 2022 • Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler
There have been a lot of interest in the scaling properties of Transformer models.
no code implementations • ICLR 2022 • Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi
Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks.
no code implementations • 29 Sep 2021 • Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi
It is shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution; self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution.
no code implementations • ICLR 2022 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler
The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.
3 code implementations • 22 Sep 2021 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler
The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.
1 code implementation • 10 Jun 2021 • Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi
It has been shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution, self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution.
no code implementations • ICLR 2021 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler
Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity.
5 code implementations • 8 Nov 2020 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler
In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.
Ranked #18 on Long-range modeling on LRA (Pathfinder metric)
1 code implementation • 31 May 2020 • Samira Abnar, Mostafa Dehghani, Willem Zuidema
Having the right inductive biases can be crucial in many tasks or scenarios where data or computing resources are a limiting factor, or where training data is not perfectly representative of the conditions at test time.
7 code implementations • ACL 2020 • Samira Abnar, Willem Zuidema
This makes attention weights unreliable as explanations probes.
no code implementations • 15 Dec 2019 • Niels van der Heijden, Samira Abnar, Ekaterina Shutova
The lack of annotated data in many languages is a well-known challenge within the field of multilingual natural language processing (NLP).
1 code implementation • WS 2019 • Samira Abnar, Lisa Beinborn, Rochelle Choenni, Willem Zuidema
In this paper, we define and apply representational stability analysis (ReStA), an intuitive way of analyzing neural language models.
1 code implementation • 4 Jun 2019 • Samira Abnar, Lisa Beinborn, Rochelle Choenni, Willem Zuidema
In this paper, we define and apply representational stability analysis (ReStA), an intuitive way of analyzing neural language models.
1 code implementation • 4 Apr 2019 • Lisa Beinborn, Samira Abnar, Rochelle Choenni
Language-brain encoding experiments evaluate the ability of language models to predict brain responses elicited by language stimuli.
no code implementations • 15 Jan 2019 • Samira Abnar, Tania Bedrax-Weiss, Tom Kwiatkowski, William W. Cohen
Current state-of-the-art question answering models reason over an entire passage, not incrementally.
no code implementations • WS 2018 • Samira Abnar, Rasyan Ahmed, Max Mijnheer, Willem Zuidema
We evaluate 8 different word embedding models on their usefulness for predicting the neural activation patterns associated with concrete nouns.