Recurrent Neural Networks (RNNs)‑based automatic speech recognition has become prevalent recently, and it is desirable to deploy such RNNs (LSTM or GRU) (Long short term memory or Gated recurrent unit) on mobile devices and execute in a real-time manner. Model compression and mobile acceleration framework will be desirable to facilitate real-time RNN execution. Current model compression techniques for RNNs, such as ESE (Efficient Speech recognition Engine) and C-LSTM, suffer from limited compression rates and may have an irregular structure not compatible with hardware acceleration, mainly focusing on FPGA (Field-Programmable Gate Array) implementation. Furthermore, existing deep learning acceleration frameworks for mobile devices, such as TensorFlow-Lite, focus on feed‑forward Deep Neural Networks (DNNs) and do not support RNNs. Therefore, to achieve real-time inference for RNNs on mobile devices, it is necessary to develop an end-to-end RNN acceleration framework that can achieve both high inference accuracy and high computational efficiency.
Technology Overview
Northeastern researchers have developed a real-time RNN acceleration framework for mobile devices named RTMobile. For the first time, RNN inference supports mobile devices and achieves real-time performance. RTMobile is composed of two main components: block-based structured pruning and compiler-assisted performance optimizations. Unlike conventional structured pruning methods used on DNNs, this novel block-based structured pruning reduces the size of decision trees by removing sections of the tree that provide little power to classify instances. This approach can maintain high inference accuracy while significantly reducing the RNN model size. It achieves both the advantages of structured and non-structured weight pruning while hiding their weaknesses. Newly developed compiler-based optimization techniques are used to determine the block size (in block-based structured pruning) and generate optimal codes on mobile devices. These optimization techniques include matrix reorder, load redundancy elimination, and a new compact data format for pruned model storage (called BSPC, i.e., Block-based Structured Pruning Compact format).
- RTMobile supports RNN inference on mobile devices and also achieves more than real-time RNN inference
- For speech recognition task (TIMIT) dataset, RTMobile can compress the GRU model by over 10X without losing accuracy 
- For mobile acceleration, RTMobile can obtain 50X energy efficiency
- Can be used to any application that requires real-time, fast implementation of deep learning and AI systems will promote the wide application of RNNs on embedded, mobile, and IoT(Internet of things) systems
- Real-time speech recognition, Natural Language Processing (NLP), and human-machine interaction systems
- Auto driving systems, Unmanned Aerial Vehicles (UAVs) and intelligent robotic systems
- Cloud-based AI and deep learning accelerators
- Field testing, road scan, and sensor-based intelligent systems
- License
- Partnering
- Research collaboration
Patent Information:
For Information, Contact:
Colin Sullivan
Commercialization Consultant
Northeastern University
Yanzhi Wang
Peiyan Dong
Zhengang Li
Wei Niu
Artificial intelligence
Deep learning
Embedded Systems
Model Compression
Real-Time Implementation
Recurrent Neural Network