Search Results for author: Róbert Csordás

Found 19 papers, 16 papers with code

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

1 code implementation • 13 Dec 2023 • Róbert Csordás, Piotr Piękos, Kazuki Irie, Jürgen Schmidhuber

The costly self-attention layers in modern Transformers require memory and compute quadratic in sequence length.

Language Modelling

Paper
Code

Automating Continual Learning

1 code implementation • 1 Dec 2023 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments.

Continual Learning Image Classification +2

Paper
Code

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions

1 code implementation • 24 Oct 2023 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions.

Paper
Code

Approximating Two-Layer Feedforward Networks for Efficient Transformers

2 code implementations • 16 Oct 2023 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

Unlike prior work that compares MoEs with dense baselines under the compute-equal condition, our evaluation condition is parameter-equal, which is crucial to properly evaluate LMs.

Paper
Code

Randomized Positional Encodings Boost Length Generalization of Transformers

1 code implementation • 26 May 2023 • Anian Ruoss, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Róbert Csordás, Mehdi Bennani, Shane Legg, Joel Veness

Transformers have impressive generalization capabilities on tasks with a fixed context length.

Paper
Code

Mindstorms in Natural Language-Based Societies of Mind

no code implementations • 26 May 2023 • Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen Schmidhuber

What should be the social structure of an NLSOM?

3D Generation Image Captioning +2

Paper
Add Code

Topological Neural Discrete Representation Learning à la Kohonen

1 code implementation • 15 Feb 2023 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

Unsupervised learning of discrete representations from continuous ones in neural networks (NNs) is the cornerstone of several applications today.

Representation Learning

Paper
Code

CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations

1 code implementation • 12 Oct 2022 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

While the original CTL is used to test length generalization or productivity, CTL++ is designed to test systematicity of NNs, that is, their capability to generalize to unseen compositions of known functions.

Paper
Code

A Generalist Neural Algorithmic Learner

2 code implementations • 22 Sep 2022 • Borja Ibarz, Vitaly Kurin, George Papamakarios, Kyriacos Nikiforou, Mehdi Bennani, Róbert Csordás, Andrew Dudzik, Matko Bošnjak, Alex Vitvitskyi, Yulia Rubanova, Andreea Deac, Beatrice Bevilacqua, Yaroslav Ganin, Charles Blundell, Petar Veličković

The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution.

Learning to Execute

379

Paper
Code

The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention

1 code implementation • 11 Feb 2022 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

Linear layers in neural networks (NNs) trained by gradient descent can be expressed as a key-value memory system which stores all training datapoints and the initial weights, and produces outputs using unnormalised dot attention over the entire training experience.

Continual Learning Image Classification +1

Paper
Code

A Modern Self-Referential Weight Matrix That Learns to Modify Itself

2 code implementations • 11 Feb 2022 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

The weight matrix (WM) of a neural network (NN) is its program.

Few-Shot Learning

159

Paper
Code

Improving Baselines in the Wild

1 code implementation • 31 Dec 2021 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

We share our experience with the recently released WILDS benchmark, a collection of ten datasets dedicated to developing models and training strategies which are robust to domain shifts.

Paper
Code

The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization

1 code implementation • 14 Oct 2021 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

Despite progress across a broad range of applications, Transformers have limited success in systematic generalization.

ListOps Systematic Generalization

Paper
Code

Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization

no code implementations • NeurIPS Workshop AIPLANS 2021 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization.

Systematic Generalization

Paper
Add Code

Adaptive Control Flow in Transformers Improves Systematic Generalization

no code implementations • ICLR 2022 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization.

ListOps Systematic Generalization

Paper
Add Code

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

1 code implementation • EMNLP 2021 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

Our models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS.

Systematic Generalization

Paper
Code

Going Beyond Linear Transformers with Recurrent Fast Weight Programmers

5 code implementations • NeurIPS 2021 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

Transformers with linearised attention (''linear Transformers'') have demonstrated the practical scalability and effectiveness of outer product-based Fast Weight Programmers (FWPs) from the '90s.

Atari Games ListOps

159

Paper
Code

Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

1 code implementation • ICLR 2021 • Róbert Csordás, Sjoerd van Steenkiste, Jürgen Schmidhuber

Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc.

Systematic Generalization

Paper
Code

Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control

1 code implementation • 23 Apr 2019 • Róbert Csordás, Jürgen Schmidhuber

The Differentiable Neural Computer (DNC) can learn algorithmic and question answering tasks.

Question Answering

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.