no code implementations • 26 Feb 2024 • Mustafa Altay Karamuftuoglu, Beyza Zeynep Ucpinar, Arash Fayyazi, Sasan Razmkhah, Mehdi Kamal, Massoud Pedram
A novel high-fan-in differential superconductor neuron structure designed for ultra-high-performance Spiking Neural Network (SNN) accelerators is presented.
no code implementations • 8 Feb 2024 • Seyedarmin Azizi, Mahdi Nazemi, Massoud Pedram
This paper addresses this memory limitation by introducing an activation-aware model compression methodology that uses selective low-rank weight tensor approximations of different layers to reduce the parameter count of ViTs.
no code implementations • 3 Dec 2023 • Seyedarmin Azizi, Mahdi Nazemi, Mehdi Kamal, Massoud Pedram
This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems.
no code implementations • 11 Oct 2023 • Beyza Zeynep Ucpinar, Mustafa Altay Karamuftuoglu, Sasan Razmkhah, Massoud Pedram
We present an on-chip trainable neuron circuit.
no code implementations • 12 Aug 2023 • Seyedarmin Azizi, Mahdi Nazemi, Arash Fayyazi, Massoud Pedram
As a result, our proposed method represents a leap forward in neural network design optimization, paving the way for quick model design and implementation in settings with limited resources, thereby propelling the potential of scalable deep learning solutions.
no code implementations • 14 Jul 2023 • Aupam Hamran, Marzieh Vaeztourshizi, Amirhossein Esmaili, Massoud Pedram
Different CNN architecture optimization techniques such as widening and deepening of the network and adding skip connections are applied to improve the accuracy of the network.
no code implementations • 8 May 2023 • Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram
Post-training compression techniques such as pruning and quantization can help lower deployment costs.
1 code implementation • 4 Mar 2023 • Jung Hwan Heo, Arash Fayyazi, Mahdi Nazemi, Massoud Pedram
Token pruning has emerged as an effective solution to speed up the inference of large Transformer models.
no code implementations • 30 Jul 2022 • Soheil Nazar Shahsavani, Arash Fayyazi, Mahdi Nazemi, Massoud Pedram
Recent efforts for improving the performance of neural network (NN) accelerators that meet today's application requirements have given rise to a new trend of logic-based NN inference relying on fixed function combinational logic.
no code implementations • 30 Jun 2022 • Jung Hwan Heo, Arash Fayyazi, Amirhossein Esmaili, Massoud Pedram
This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks.
no code implementations • 28 Mar 2022 • Souvik Kundu, Sairam Sundaresan, Massoud Pedram, Peter A. Beerel
In this paper, we present a fast learnable once-for-all adversarial training (FLOAT) algorithm, which instead of the existing FiLM-based conditioning, presents a unique weight conditioned learning that requires no additional layer, thereby incurring no significant increase in parameter count, training time, or network latency compared to standard adversarial training.
no code implementations • 24 Dec 2021 • Souvik Kundu, Shikai Wang, Qirui Sun, Peter A. Beerel, Massoud Pedram
Compared to the baseline FP-32 models, BMPQ can yield models that have 15. 4x fewer parameter bits with a negligible drop in accuracy.
no code implementations • NeurIPS 2021 • Souvik Kundu, Qirui Sun, Yao Fu, Massoud Pedram, Peter Beerel
Knowledge distillation (KD) has recently been identified as a method that can unintentionally leak private information regarding the details of a teacher model to an unauthorized student.
1 code implementation • ICCV 2021 • Souvik Kundu, Massoud Pedram, Peter A. Beerel
Low-latency deep spiking neural networks (SNNs) have become a promising alternative to conventional artificial neural networks (ANNs) because of their potential for increased energy efficiency on event-driven neuromorphic hardware.
no code implementations • 16 Jul 2021 • Souvik Kundu, Gourav Datta, Massoud Pedram, Peter A. Beerel
To evaluate the merits of our approach, we performed experiments with variants of VGG and ResNet, on both CIFAR-10 and CIFAR-100, and VGG16 on Tiny-ImageNet. The SNN models generated through the proposed technique yield SOTA compression ratios of up to 33. 4x with no significant drops in accuracy compared to baseline unpruned counterparts.
no code implementations • 7 Apr 2021 • Mahdi Nazemi, Arash Fayyazi, Amirhossein Esmaili, Atharva Khare, Soheil Nazar Shahsavani, Massoud Pedram
While there is a large body of research on efficient processing of deep neural networks (DNNs), ultra-low-latency realization of these models for applications with stringent, sub-microsecond latency requirements continues to be an unresolved, challenging problem.
no code implementations • 24 Jan 2021 • Mohsen Ahmadzadeh, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram
In this work, to limit the number of required attention inference hops in memory-augmented neural networks, we propose an online adaptive approach called A2P-MANN.
no code implementations • 7 Jan 2021 • Seyed Abolfazl Ghasemzadeh, Erfan Bank Tavakoli, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram
In this paper, first, a hardware-friendly pruning algorithm for reducing energy consumption and improving the speed of Long Short-Term Memory (LSTM) neural network accelerators is presented.
1 code implementation • 3 Nov 2020 • Souvik Kundu, Mahdi Nazemi, Peter A. Beerel, Massoud Pedram
This paper presents a dynamic network rewiring (DNR) method to generate pruned deep neural network (DNN) models that are robust against adversarial attacks yet maintain high accuracy on clean images.
no code implementations • 30 Jul 2020 • Mahdi Nazemi, Amirhossein Esmaili, Arash Fayyazi, Massoud Pedram
The proposed hybrid machine learning model has the same level of accuracy (i. e. $\pm$1%) as NNs while achieving at least 10% improvement in accuracy compared to HD learning models.
1 code implementation • 3 Jul 2020 • Ghasem Pasandi, Mackenzie Peterson, Moises Herrera, Shahin Nazarian, Massoud Pedram
This paper aims at integrating three powerful techniques namely Deep Learning, Approximate Computing, and Low Power Design into a strategy to optimize logic at the synthesis level.
no code implementations • 13 Feb 2020 • Mohammad Saeed Abrishami, Hao Ge, Justin F. Calderon, Massoud Pedram, Shahin Nazarian
The shrinking of transistor geometries as well as the increasing complexity of integrated circuits, significantly aggravate nonlinear design behavior.
no code implementations • 13 Feb 2020 • Mohammad Saeed Abrishami, Massoud Pedram, Shahin Nazarian
The miniaturization of transistors down to 5nm and beyond, plus the increasing complexity of integrated circuits, significantly aggravate short channel effects, and demand analysis and optimization of more design corners and modes.
no code implementations • 12 Feb 2020 • Mohammad Saeed Abrishami, Amir Erfan Eshratifar, David Eigen, Yanzhi Wang, Shahin Nazarian, Massoud Pedram
However, fine-tuning a transfer model with data augmentation in the raw input space has a high computational cost to run the full network for every augmented input.
1 code implementation • 29 Jan 2020 • Souvik Kundu, Mahdi Nazemi, Massoud Pedram, Keith M. Chugg, Peter A. Beerel
We also compared the performance of our proposed architectures with that of ShuffleNet andMobileNetV2.
no code implementations • 14 Jan 2020 • Amir Erfan Eshratifar, Massoud Pedram
The proposed algorithm allows the mobile device to detect the inputs that can be processed locally and the ones that require a larger model and should be sent a cloud server.
no code implementations • 11 Dec 2019 • Amirhossein Esmaili, Massoud Pedram
Energy consumption is one of the most critical concerns in designing computing devices, ranging from portable embedded systems to computer cluster systems.
no code implementations • 6 Sep 2019 • Amir Erfan Eshratifar, David Eigen, Michael Gormish, Massoud Pedram
Small inter-class and large intra-class variations are the main challenges in fine-grained visual classification.
no code implementations • 11 May 2019 • Ting-Ru Lin, Drew Penney, Massoud Pedram, Lizhong Chen
Machine learning applied to architecture design presents a promising opportunity with broad applications.
no code implementations • 4 Feb 2019 • Amir Erfan Eshratifar, Amirhossein Esmaili, Massoud Pedram
Recent studies have shown the latency and energy consumption of deep neural networks can be significantly improved by splitting the network between the mobile device and cloud.
no code implementations • 1 Feb 2019 • Amir Erfan Eshratifar, Amirhossein Esmaili, Massoud Pedram
In this approach, referred to as collaborative intelligence, intermediate features computed on the mobile device are offloaded to the cloud instead of the raw input data of the network, reducing the size of the data needed to be sent to the cloud.
Distributed, Parallel, and Cluster Computing
no code implementations • 1 Feb 2019 • Ghasem Pasandi, Shahin Nazarian, Massoud Pedram
Approximate Logic Synthesis (ALS) is the process of synthesizing and mapping a given Boolean network to a library of logic cells so that the magnitude/rate of error between outputs of the approximate and initial (exact) Boolean netlists is bounded from above by a predetermined total error threshold.
Hardware Architecture
no code implementations • 30 Dec 2018 • Shayan Tabatabaei Nikkhah, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram
The results on various benchmarks demonstrate significant improvements in the prediction accuracy compared to the prior works which used only the accelerator inputs for the prediction.
1 code implementation • 19 Dec 2018 • Amirhossein Esmaili, Mahdi Nazemi, Massoud Pedram
Energy efficiency is one of the most critical design criteria for modern embedded systems such as multiprocessor system-on-chips (MPSoCs).
Operating Systems Distributed, Parallel, and Cluster Computing
no code implementations • 18 Oct 2018 • Amir Erfan Eshratifar, David Eigen, Massoud Pedram
Therefore, the degree of the contribution of a task to the parameter updates is controlled by introducing a set of weights on the loss function of the tasks.
no code implementations • 21 Sep 2018 • Amir Erfan Eshratifar, Mohammad Saeed Abrishami, David Eigen, Massoud Pedram
Transfer-learning and meta-learning are two effective methods to apply knowledge learned from large data sources to new tasks.
no code implementations • 23 Jul 2018 • Mahdi Nazemi, Ghasem Pasandi, Massoud Pedram
Deep neural networks have been successfully deployed in a wide variety of applications including computer vision and speech recognition.
no code implementations • 3 Jun 2018 • Mahdi Nazemi, Massoud Pedram
Lop allows researchers and designers to quickly compare quality of their models using various data representations and arithmetic operations in Python and contrast the hardware cost of viable representations by synthesizing them on their target platforms (e. g., FPGA or ASIC).
no code implementations • 2 Feb 2018 • Ruizhe Cai, Ao Ren, Ning Liu, Caiwen Ding, Luhao Wang, Xuehai Qian, Massoud Pedram, Yanzhi Wang
In this paper, we propose VIBNN, an FPGA-based hardware accelerator design for variational inference on BNNs.
no code implementations • 25 Jan 2018 • Amir Erfan Eshratifar, Mohammad Saeed Abrishami, Massoud Pedram
Deep learning models are being deployed in many mobile intelligent applications.
no code implementations • 11 Jan 2018 • Mahdi Nazemi, Amir Erfan Eshratifar, Massoud Pedram
With ever-increasing application of machine learning models in various domains such as image classification, speech recognition and synthesis, and health care, designing efficient hardware for these models has gained a lot of popularity.
no code implementations • 13 Dec 2017 • Sheng Lin, Ning Liu, Mahdi Nazemi, Hongjia Li, Caiwen Ding, Yanzhi Wang, Massoud Pedram
The large model size of DNNs, while providing excellent accuracy, also burdens the embedded platforms with intensive computation and storage.
no code implementations • 6 Jul 2017 • Mahdi Nazemi, Shahin Nazarian, Massoud Pedram
Independent Component Analysis (ICA) is a dimensionality reduction technique that can boost efficiency of machine learning models that deal with probability density functions, e. g. Bayesian neural networks.
1 code implementation • 25 Oct 2007 • Ali Iranli, Hanif Fatemi, Massoud Pedram
In this paper, a method is proposed for finding a pixel transformation function that maximizes backlight dimming while maintaining a pre-specified image distortion level for a liquid crystal display.
Other Computer Science