no code implementations • 29 May 2024 • Viviane Potocnik, Luca Colagrande, Tim Fischer, Luca Bertaccini, Daniele Jahier Pagliari, Alessio Burrello, Luca Benini
For decoder-only topologies, we achieve 16. 1x speedup in the Non-Autoregressive (NAR) mode and up to 35. 6x speedup in the Autoregressive (AR) mode compared to the baseline implementation.
1 code implementation • 10 Jan 2023 • Yvan Tortorella, Luca Bertaccini, Luca Benini, Davide Rossi, Francesco Conti
The increasing interest in TinyML, i. e., near-sensor machine learning on power budgets of a few tens of mW, is currently pushing toward enabling TinyML-class training as opposed to inference only.