Deep neural networks (DNNs) are both computation and storage intensive, which prevent them from a wide application on power-budgeted embedded and IoT systems. To overcome this hurdle, several prior works are dedicated to model compression techniques for DNNs, to simultaneously reduce the model size (storage requirement) and accelerate the computation, with minor effect on accuracy. Two important categories of DNN model compression techniques are weight pruning and weight quantization. However, the prior work mainly used heuristic methods for model compression, and there lacks an effective combination of weight pruning and quantization methods.
Technology Overview
Northeastern researchers propose a novel idea of ADMM-NN. The first part of ADMM‑NN is a systematic, joint framework of DNN weight pruning and quantization using the strong mathematical solution framework ADMM (Alternating Direction Methods of Multipliers). The ADMM framework can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in high performance in DNN model compression.
The second part of ADMM-NN is the hardware-aware optimization of DNNs to facilitate efficient hardware implementations. In this technique, a concept of the break-even pruning ratio, defined as the minimum weight pruning ratio, is adopted, which is defined as the minimum weight pruning ratio of a specific DNN layer that will not result in hardware performance (speed) degradation. These values are hardware platform-specific. Based on the calculation of such ratios through hardware synthesis, an efficient DNN model compression algorithm is developed for computation reduction and efficient hardware implementations.
- It can achieve 85X and 24X pruning on representative LeNet-5 and AlexNet models, respectively, without accuracy loss
- It can achieve 1,910X and 231X reductions in overall model size on these two benchmarks when focusing on weight data storage
- Highly promising results are also observed on other representative DNNs such as VGGNet and ResNet-50
- Significant on-chip acceleration of DNN inference due to significantly reduced computation, storage, and communication cost. Potential on-chip storage of the whole DNN models
- It can be used in the applications of DNNs on embedded, mobile, and IoT systems
- Auto driving systems, unmanned aerial vehicles (UAVs) and intelligent robotic systems
- Real-time medical imaging applications
- Cloud-based AI and deep learning accelerators
- Field testing, road scan, and sensor-based intelligent systems
- License
- Partnering
- Research collaboration
Patent Information:
For Information, Contact:
Mark Saulich
Associate Director of Commercialization
Northeastern University
Artificial intelligence
Deep learning
Embedded Systems
Model Compression
Real-Time Implementation