no code implementations • 10 Jun 2024 • Yelysei Bondarenko, Riccardo Del Chiaro, Markus Nagel
Unlike most related work, our method (i) is inference-efficient, leading to no additional overhead compared to traditional PTQ; (ii) can be seen as a general extended pretraining framework, meaning that the resulting model can still be utilized for any downstream task afterwards; (iii) can be applied across a wide range of quantization settings, such as different choices quantization granularity, activation quantization, and seamlessly combined with many PTQ techniques.
1 code implementation • NeurIPS 2020 • Riccardo Del Chiaro, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost Van de Weijer
We call our method Recurrent Attention to Transient Tasks (RATT), and also show how to adapt continual learning approaches based on weight egularization and knowledge distillation to recurrent continual learning problems.