no code implementations • 17 Apr 2024 • Rachid Karami, Hemanth Kota, Sheng-Chun Kao, Hyoukjun Kwon
Therefore, significant effort has been put to study and optimize the GEMM operators in order to speed up the execution of ML models.
1 code implementation • 7 Feb 2024 • Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna
In this work, we study the effectiveness of existing sparse training recipes at \textit{high-sparsity regions} and argue that these methods fail to sustain the model quality on par with low-sparsity regions.
1 code implementation • 27 Apr 2023 • Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research.
1 code implementation • 7 Oct 2022 • Sheng-Chun Kao, Angshuman Parashar, Po-An Tsai, Tushar Krishna
Map Space Exploration is the problem of finding optimized mappings of a Deep Neural Network (DNN) model on an accelerator.
no code implementations • 15 Sep 2022 • Sheng-Chun Kao, Amir Yazdanbakhsh, Suvinay Subramanian, Shivani Agrawal, Utku Evci, Tushar Krishna
In this work, we focus on N:M sparsity and extensively study and evaluate various training recipes for N:M sparsity in terms of the trade-off between model accuracy and compute cost (FLOPs).
no code implementations • 26 Jan 2022 • Sheng-Chun Kao, Xiaoyu Huang, Tushar Krishna
Dataflow/mapping decides the compute and energy efficiency of DNN accelerators.
2 code implementations • 26 Jan 2022 • Sheng-Chun Kao, Michael Pellauer, Angshuman Parashar, Tushar Krishna
The design of DNN accelerators includes two key parts: HW resource configuration and mapping strategy.
no code implementations • 13 Jul 2021 • Sheng-Chun Kao, Suvinay Subramanian, Gaurav Agrawal, Amir Yazdanbakhsh, Tushar Krishna
In contrast, FLAT unblocks transformer models for inputs with up to 64K elements
no code implementations • 28 Apr 2021 • Sheng-Chun Kao, Tushar Krishna
In particular, we focus on the problem of mapping jobs from several DNNs simultaneously on an accelerator.
1 code implementation • 4 Sep 2020 • Sheng-Chun Kao, Geonhwa Jeong, Tushar Krishna
We also augment the RL approach with a genetic algorithm for further fine-tuning.
no code implementations • 6 Jun 2020 • Sheng-Chun Kao, Arun Ramamurthy, Reed Williams, Tushar Krishna
Designing resource-efficient Deep Neural Networks (DNNs) is critical to deploy deep learning solutions over edge platforms due to diverse performance, power, and memory budgets.
no code implementations • 6 Jun 2020 • Sheng-Chun Kao, Arun Ramamurthy, Tushar Krishna
We propose a new way for autonomous quantization and HW-aware tuning.
2 code implementations • 13 Aug 2019 • Sheng-Chun Kao, Chao-Han Huck Yang, Pin-Yu Chen, Xiaoli Ma, Tushar Krishna
In this work, we demonstrate the promise of applying reinforcement learning (RL) to optimize NoC runtime performance.