Search Results for author: Samira Abnar

Found 19 papers, 9 papers with code

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

no code implementations • 13 Oct 2023 • Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio

We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i. e., the depth of the computation graph).

Retrieval

Paper
Add Code

Diffusion Probabilistic Fields

no code implementations • 1 Mar 2023 • Peiye Zhuang, Samira Abnar, Jiatao Gu, Alex Schwing, Joshua M. Susskind, Miguel Ángel Bautista

Diffusion probabilistic models have quickly become a major approach for generative modeling of images, 3D geometry, video and other domains.

Denoising

Paper
Add Code

GAUDI: A Neural Architect for Immersive 3D Scene Generation

1 code implementation • 27 Jul 2022 • Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.

Ranked #1 on Image Generation on ARKitScenes

Image Generation Scene Generation

604

Paper
Code

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?

no code implementations • 21 Jul 2022 • Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler

There have been a lot of interest in the scaling properties of Transformer models.

Inductive Bias

Paper
Add Code

Exploring the Limits of Large Scale Pre-training

no code implementations • ICLR 2022 • Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi

Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks.

Paper
Add Code

Gradual Domain Adaptation in the Wild: When Intermediate Distributions are Absent

no code implementations • 29 Sep 2021 • Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi

It is shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution; self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution.

Domain Adaptation

Paper
Add Code

Scale Efficiently: Insights from Pretraining and Finetuning Transformers

no code implementations • ICLR 2022 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

Paper
Add Code

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

3 code implementations • 22 Sep 2021 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

32,922

Paper
Code

Gradual Domain Adaptation in the Wild:When Intermediate Distributions are Absent

1 code implementation • 10 Jun 2021 • Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi

It has been shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution, self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution.

Domain Adaptation

Paper
Code

Long Range Arena : A Benchmark for Efficient Transformers

no code implementations • ICLR 2021 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity.

16k Benchmarking

Paper
Add Code

Long Range Arena: A Benchmark for Efficient Transformers

5 code implementations • 8 Nov 2020 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.

Ranked #18 on Long-range modeling on LRA (Pathfinder metric)

16k Benchmarking +1

683

Paper
Code

Transferring Inductive Biases through Knowledge Distillation

1 code implementation • 31 May 2020 • Samira Abnar, Mostafa Dehghani, Willem Zuidema

Having the right inductive biases can be crucial in many tasks or scenarios where data or computing resources are a limiting factor, or where training data is not perfectly representative of the conditions at test time.

Knowledge Distillation

Paper
Code

Quantifying Attention Flow in Transformers

7 code implementations • ACL 2020 • Samira Abnar, Willem Zuidema

This makes attention weights unreliable as explanations probes.

739

Paper
Code

A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings

no code implementations • 15 Dec 2019 • Niels van der Heijden, Samira Abnar, Ekaterina Shutova

The lack of annotated data in many languages is a well-known challenge within the field of multilingual natural language processing (NLP).

Multilingual Word Embeddings named-entity-recognition +7

Paper
Add Code

Blackbox Meets Blackbox: Representational Similarity \& Stability Analysis of Neural Language Models and Brains

1 code implementation • WS 2019 • Samira Abnar, Lisa Beinborn, Rochelle Choenni, Willem Zuidema

In this paper, we define and apply representational stability analysis (ReStA), an intuitive way of analyzing neural language models.

Paper
Code

Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains

1 code implementation • 4 Jun 2019 • Samira Abnar, Lisa Beinborn, Rochelle Choenni, Willem Zuidema

In this paper, we define and apply representational stability analysis (ReStA), an intuitive way of analyzing neural language models.

Paper
Code

Robust Evaluation of Language-Brain Encoding Experiments

1 code implementation • 4 Apr 2019 • Lisa Beinborn, Samira Abnar, Rochelle Choenni

Language-brain encoding experiments evaluate the ability of language models to predict brain responses elicited by language stimuli.

Paper
Code

Incremental Reading for Question Answering

no code implementations • 15 Jan 2019 • Samira Abnar, Tania Bedrax-Weiss, Tom Kwiatkowski, William W. Cohen

Current state-of-the-art question answering models reason over an entire passage, not incrementally.

Continual Learning Question Answering

Paper
Add Code

Experiential, Distributional and Dependency-based Word Embeddings have Complementary Roles in Decoding Brain Activity

no code implementations • WS 2018 • Samira Abnar, Rasyan Ahmed, Max Mijnheer, Willem Zuidema

We evaluate 8 different word embedding models on their usefulness for predicting the neural activation patterns associated with concrete nouns.

Word Embeddings

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.