Human pose estimation aims to estimate an interpretable low-dimension representation of human bodies, which is useful for many real-world applications such as sports, security, autonomous self-driving cars, and robotics. Speed and accuracy are two major concerns in those applications. As a trade-off, existing methods often sacrifice their accuracy in order to boost their speed. A light-weight pose estimation network with a multi-scale heatmap fusion mechanism is used to estimate 2D human poses from a single RGB image, which runs on mobile devices in real-time and achieves comparable performance with the state-of-the-art methods in terms of accuracy.
Technology Overview
In this invention, Northeastern University researchers proposed a novel idea of a light-weight pose estimation network with a multi-scale heatmap fusion mechanism. As in other pose estimation models, the proposed network has two parts: a backbone architecture and a head structure. To achieve low model complexity, a plug-and-play structure named Low-rank Pointwise Residual module (LPR) is used. On one hand, the computation cost and parameters are reduced significantly when the number of filters is much less than the input channels. On the other hand, to compensate for the low rankness of pointwise convolution and performance recession due to this compression, a residual operation through depth-wise convolution is implemented to complement the feature maps without any additional parameters. To achieve better performance on pose estimation task, an LPR module on the architecture of HRNet is implemented, which is specifically designed for pose estimation and achieves state-of-the-art performance by maintaining high-resolution representations through the whole process. To further improve the performance, a novel multi-scale heatmap estimation and fusion mechanism is implemented, which localizes joints from extracted feature maps at multiple scales and combine the multiscale results together to make a final estimation. The multi-scale estimation and fusion technique tries to localize body joints on different scales using a single estimating layer, which solves the scaling problem and is robust to the large variety of human body poses. Such a design of the head network further boosts the accuracy of pose estimation performance.
- This invention runs much faster compared with state-of-the-art methods and achieves comparable accuracy. 
- It has been implemented in mobile devices and runs in real-time with robust and accurate performance. 
- This invention solves the scaling problem of pose estimation by utilizing multi-scale feature extraction, feature fusion, and multi-scale heatmap estimation and fusion mechanisms.
- Can be applied to detecting human behaviors in monitoring systems.
- Can applied for human-computer interaction such as in video games which uses human body movement as input (e.g. Xbox Kinect).
- Can be applied in many interesting mobile apps which requires human body movement as input such as personal fitting and training.
- License
- Partnering
- Research collaboration
Patent Information:
For Information, Contact:
Colin Sullivan
Commercialization Consultant
Northeastern University
Yun Fu
Songyao Jiang
Bin Sun
Human body keypoint detection
Pose Estimation
Pose tracking
Skeleton detection