2 code implementations • 16 Feb 2024 • Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev
This paper addresses the challenge of processing long documents using generative transformer models.
1 code implementation • 27 Jul 2022 • Artyom Sorokin, Nazar Buzun, Leonid Pugachev, Mikhail Burtsev
This requires to store prohibitively large intermediate data if a sequence consists of thousands or even millions elements, and as a result, makes learning of very long-term dependencies infeasible.