A multi-modality compatible 3D human pose estimation model

Institute Reference: INV-21105


As the applications of machine learning (ML) expand into our everyday life, the issue of data scarcity poses a seemingly insurmountable hurdle for many scientific, industrial, and healthcare applications. Gathering or labeling data can be prohibitively expensive due to sampling cost or strong privacy laws. Data-efficient ML approaches therefore have been trying to exploit structural knowledge in the problem in order to constrain them to a point where the model is simple enough to be correctly trained with the available data.

In the past few years, monocular 3D human pose estimation from a single RGB image has received significant attention. Pose inference models with competitive performance, however, require supervision with 3D pose ground truth data or at least known pose priors in their target domain. Yet, these data requirements in target applications with data collection constraints may not be achievable. 


Technology Overview

Here Northeastern inventors present a heuristic weakly supervised solution called HW-HuP, to estimate 3D human pose in contexts where no ground truth 3D data is accessible, even for fine-tuning. HW-HuP learns a set of prior poses from public 3D human pose datasets, and uses easy-to-access observations from the target domain to iteratively estimate 3D human pose and shape in an optimization and regression hybrid cycle.

In this design, depth data as an auxiliary information is employed for supervision, yet it is not needed for the inference.

To verify their approach, the group deployed their pre‑trained model for 3D in‑bed human pose estimation. The system consisted of: 

(1)A webcam installed on the ceiling to have a bird’s‑eye view 

(2) Their code to track in-bed 3D human poses

For long term monitoring over night, users can replace the webcam with a Kinect depth camera.


  • Low cost: For deployment, a depth sensor is not necessary. This approach is able to estimate 3D poses and shapes via a low cost webcam
  • Compatible with other modalities such as thermal, depth or pressure map
  • Works in adversarial vision conditions: darkness and heavy occlusion
  • Combines existing general purpose datasets and easy to access observation from the target domain
  • Can train a new model under a novel context: no motion capture device is required, only depth and an inference modality such as RGB are needed


  • In-bed human pose estimation
  • Bed bound patient monitoring
  • Driver behavior study via 3D pose
  • Pilot training in cockpit
  • Infant 3D pose tracking or study
  • AI Fitness Coach App


  • License
  • Partnering
  • Research collaboration


  • Commercial partner
  • Development partner
  • Licensing

IP Status

  • Provisional patent


Patent Information:
For Information, Contact:
Mark Saulich
Associate Director of Commercialization
Northeastern University
Sarah Ostadabbas
Shuangjun Liu
Xiaofei Huang
Nihang Fu
computer vision and deep learning
Data Efficient Machine Learning,
Pose Estimation