no code implementations • 6 Dec 2022 • Kavya Sreedhar, Jason Clemons, Rangharajan Venkatesan, Stephen W. Keckler, Mark Horowitz
To create dynamic models, we leverage the resilience of vision transformers to pruning and switch between different scaled versions of a model.
no code implementations • 13 Jun 2022 • Charbel Sakr, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, William J. Dally, Brucek Khailany
Data clipping is crucial in reducing noise in quantization operations and improving the achievable accuracy of quantization-aware training (QAT).
no code implementations • 26 Jun 2021 • Jiawei Zhao, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, Mustafa Ali, Ming-Yu Liu, Brucek Khailany, Bill Dally, Anima Anandkumar
Representing deep neural networks (DNNs) in low-precision is a promising approach to enable efficient acceleration and memory reduction.
no code implementations • 8 Feb 2021 • Steve Dai, Rangharajan Venkatesan, Haoxing Ren, Brian Zimmer, William J. Dally, Brucek Khailany
4-bit weights and 8-bit activations achieve near-full-precision accuracy for both BERT-base and BERT-large on SQuAD while reducing area by 26% compared to an 8-bit baseline.
1 code implementation • Design Automation Conference (DAC) 2019 • Angad S. Rekhi, Brian Zimmer, Nikola Nedovic, Ningxi Liu, Rangharajan Venkatesan, Miaorong Wang, Brucek Khailany, William J. Dally, C. Thomas Gray
We also introduce an energy model to predict the requirements of high-accuracy AMS hardware running large networks and use it to show that for ADC-dominated designs, there is a direct tradeoff between energy efficiency and network accuracy.
no code implementations • 23 May 2017 • Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, William J. Dally
Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning.