Search Results for author: Ritchie Zhao

Found 8 papers, 3 papers with code

Microscaling Data Formats for Deep Learning

1 code implementation • 16 Oct 2023 • Bita Darvish Rouhani, Ritchie Zhao, Ankit More, Mathew Hall, Alireza Khodamoradi, Summer Deng, Dhruv Choudhary, Marius Cornea, Eric Dellinger, Kristof Denolf, Stosic Dusan, Venmugil Elango, Maximilian Golub, Alexander Heinecke, Phil James-Roxby, Dharmesh Jani, Gaurav Kolhe, Martin Langhammer, Ada Li, Levi Melnick, Maral Mesmakhosroshahi, Andres Rodriguez, Michael Schulte, Rasoul Shafipour, Lei Shao, Michael Siu, Pradeep Dubey, Paulius Micikevicius, Maxim Naumov, Colin Verrilli, Ralph Wittig, Doug Burger, Eric Chung

Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications.

Friction

108

Paper
Code

With Shared Microexponents, A Little Shifting Goes a Long Way

no code implementations • 16 Feb 2023 • Bita Rouhani, Ritchie Zhao, Venmugil Elango, Rasoul Shafipour, Mathew Hall, Maral Mesmakhosroshahi, Ankit More, Levi Melnick, Maximilian Golub, Girish Varatkar, Lei Shao, Gaurav Kolhe, Dimitry Melts, Jasmine Klar, Renee L'Heureux, Matt Perry, Doug Burger, Eric Chung, Zhaoxia Deng, Sam Naghshineh, Jongsoo Park, Maxim Naumov

This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning.

Quantization Recommendation Systems

Paper
Add Code

Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point

no code implementations • NeurIPS 2020 • Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao, Ming Liu, Jeremy Fowers, Kalin Ovtcharov , Anna Vinogradsky, Sarah Massengill , Lita Yang, Ray Bittner, Alessandro Forin, Haishan Zhu, Taesik Na, Prerak Patel, Shuai Che, Lok Chand Koppaka , Xia Song, Subhojit Som, Kaustav Das, Saurabh T, Steve Reinhardt , Sitaram Lanka, Eric Chung, Doug Burger

In this paper, we explore the limits of Microsoft Floating Point (MSFP), a new class of datatypes developed for production cloud-scale inferencing on custom hardware.

Image Classification Question Answering

Paper
Add Code

Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations

1 code implementation • ICLR 2020 • Yichi Zhang, Ritchie Zhao, Weizhe Hua, Nayun Xu, G. Edward Suh, Zhiru Zhang

The proposed approach is applicable to a variety of DNN architectures and significantly reduces the computational cost of DNN execution with almost no accuracy loss.

Quantization

Paper
Code

OverQ: Opportunistic Outlier Quantization for Neural Network Accelerators

no code implementations • 13 Oct 2019 • Ritchie Zhao, Jordan Dotzel, Zhanqiu Hu, Preslav Ivanov, Christopher De Sa, Zhiru Zhang

Specialized hardware for handling activation outliers can enable low-precision neural networks, but at the cost of nontrivial area overhead.

Quantization

Paper
Add Code

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting

3 code implementations • 28 Jan 2019 • Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang

The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floating-point model without (re)training.

Language Modelling Neural Network Compression +1

4,309

Paper
Code

Building Efficient Deep Neural Networks with Unitary Group Convolutions

no code implementations • CVPR 2019 • Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang

UGConvs generalize two disparate ideas in CNN architecture, channel shuffling (i. e. ShuffleNet) and block-circulant networks (i. e. CirCNN), and provide unifying insights that lead to a deeper understanding of each technique.

Paper
Add Code

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

no code implementations • 15 Jul 2017 • Jeng-Hau Lin, Tianwei Xing, Ritchie Zhao, Zhiru Zhang, Mani Srivastava, Zhuowen Tu, Rajesh K. Gupta

State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.