no code implementations • 28 Apr 2022 • Miloš Nikolić, Enrique Torres Sanchez, Jiahui Wang, Ali Hadi Zadeh, Mostafa Mahmoud, Ameer Abdelhadi, Andreas Moshovos
We introduce a software-hardware co-design approach to reduce memory traffic and footprint during training with BFloat16 or FP32 boosting energy efficiency and execution time performance.
no code implementations • 23 Mar 2022 • Ali Hadi Zadeh, Mostafa Mahmoud, Ameer Abdelhadi, Andreas Moshovos
Mokey reduces the footprint of state-of-the-art 32-bit or 16-bit floating-point transformer models by quantizing all values to 4-bit indexes into dictionaries of representative 16-bit fixed-point centroids.