Weight pruning of deep neural networks (DNNs) has been widely adopted to satisfy the limited storage and computing capability of mobile edge devices. Among the existing pruning methods, pattern-based sparsity achieves both high inference accuracy and promising on-device acceleration performance. However, existing weight pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data. To mitigate this concern, a privacy-preserving-oriented weight pruning and mobile acceleration framework that does not require the original dataset is used. At the algorithm level of the proposed framework, a systematic weight pruning scheme based on the alternating direction method of multipliers (ADMM) is designed to iteratively solve the joint pattern and connectivity pruning problem for each layer without the usage of the original dataset. This matches with corresponding optimizations at compiler level for inference accelerations on devices. With the proposed framework, users could avoid the time-consuming weight pruning process for non-experts and directly benefit from compressed models.
Technology Overview
To deal with these problems, Northeastern University researchers have invented a privacy-preserving-oriented DNN weight pruning and model acceleration framework. At the algorithm level, the proposed framework prunes DNN models provided by users with pattern-based sparsity, without the usage of any information about the original (private) dataset. Specifically, the pruning of the DNN model is achieved by pruning layers sequentially with randomly generated samples. Instead of using the loss value, reconstruction error is introduced to measure whether enough information is maintained after pruning. By forming the weight pruning problem into a mathematical optimization problem, the proposed framework solves the pattern pruning and connectivity pruning problems iteratively and analytically by extending the potent ADMM algorithm. Furthermore, the resulting structures of pruned DNN models are more hardware friendly and compatible with the code generation capability of compilers. After retraining the compressed model obtained by this framework, users can achieve real-time inference performance without accuracy loss.
- It does not need the usage of the original training dataset. 
- Experimental results show that the proposed framework outperforms three state-of-art end-to-end DNN frameworks, i.e., TensorFlow-Lite, TVM, and MNN, with speedups up to 10.8X, 5.4X, and 4.9X, respectively, with no accuracy loss. 
- Real-time performance can be achieved on representative DNNs using mobile devices. 
- These testings are performed using three widely used DNNs, VGG-16, ResNet-50, and MobileNet-V2, and two benchmark datasets, ImageNet and CIFAR-10.
- Applicable to any application that requires the real-time, fast implementation of deep learning, AI and machine intelligence systems, especially when there is privacy concern from users. 
- It will promote the wide application of DNNs on embedded, mobile (smartphone-based), sensor, and IoT systems. 
- Auto driving systems, unmanned aerial vehicles (UAVs) and intelligent robotic systems. 
- Real-time medical imaging applications. 
- Cloud-based AI and deep learning accelerators. 
- Field testing, road scan, and sensor-based intelligent systems.
- License
- Partnering
- Research collaboration
Patent Information:
-Sensors tech
For Information, Contact:
Mark Saulich
Associate Director of Commercialization
Northeastern University
Yanzhi Wang
Yifan Gong
Zheng Zhan
Artificial intelligence
Deep learning
Embedded Systems
Model Compression
Real-Time Implementation
Training data