With the emergence of a spectrum of high-end mobile devices, many applications that formerly required desktop-level computation capability is being transferred to a mobile platform. However, executing Deep Neural Networks (DNNs) inference is still challenging considering the high computation and storage demands, specifically when real-time performance with high accuracy is needed. Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly, and structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss.
Technology Overview
Northeastern University researchers have proposed an idea of PatDNN, a novel end-to-end mobile DNN acceleration framework that can generate highly accurate DNN models using pattern-based pruning methods and guarantee execution efficiency with compiler optimizations. PatDNN consists of two stages: 
(1) Pattern-based training stage, which performs kernel pattern and connectivity pruning (termed pattern-based pruning in general) with pattern set generation and an extended ADMM solution framework. 
(2) Execution code generation stage, which converts DNN models into computational graphs and applies multiple optimizations including a high-level and fine-grained DNN layerwise representation, filter kernel reorder, load redundancy eliminations, and automatic parameter tuning. 
All design optimizations are general and applicable to both mobile CPUs and GPUs.
- PatDNN outperforms three state-of-art end-to-end DNN frameworks: TensorFlow-Lite, TVM, and Alibaba Mobile Neural Network (MNN) with speedups up to 44.5X, 11.4X, and 7.1X, respectively, with no accuracy loss
- These testings are performed using three widely used DNNs, VGG-16, ResNet-50, and MobileNet-V2, and two benchmark datasets, ImageNet and CIFAR-10
- Using Adreno 640 embedded GPU (in a state-of-art smartphone), PatDNN achieves an unprecedented 18.9ms inference time of VGG-16 on ImageNet dataset
- Can achieve inference real-time execution of representative large-scale DNNs on mobile devices
- This invention is in general applicable to any application that requires real-time, fast implementation of deep learning and AI systems will promote the wide application ofDNNs on embedded, mobile, and IoT systems
- Auto driving systems, unmanned aerial vehicles (UAVs) and intelligent robotic systems
- Real-time medical imaging applications
- Cloud-based AI and deep learning accelerators
- Field testing, road scan, and sensor-based intelligent systems
- License
- Partnering
- Research collaboration
Patent Information:
For Information, Contact:
Mark Saulich
Associate Director of Commercialization
Northeastern University
Yanzhi Wang
Xiaolong Ma
Wei Niu
Bin Ren
Artificial intelligence
Deep learning
Mobile devices
Model Compression
Real-Time Implementation