Image-to-image translation is receiving increased attention due to an increasing need to compare and classify images, often mapping images in one domain to another. Existing methods mainly solve this task via a deep generative model and focus on exploring the relationship between different domains. These methods neglect to utilize higher-level and instance-specific information to guide the training process, leading to a great deal of unrealistic generated images of low quality.
Technology Overview
In this invention, multi-domain image-to-image translation is achieved using novel deep-learning-based adversarial networks called Segmentation Guided Generative Adversarial Networks (SG GAN). These can transfer facial attributes (e.g., hair color, gender, age) and morph facial expressions and other facial attributes to generate a target image, for such uses as changing a non-smiling image to one with a smile.
This technology provides sharp and realistic results with additional morphable features.  It detects faces from the input image and extracts corresponding semantic segmentations. Then the translation process uses the trained models by our novel deep learning based adversarial networks called Segmentation Guided Generative Adversarial Networks, which fully leverages semantic segmentation information to guide the image translation process.
The SG GAN model consists of three networks i.e., generator, discriminator and segmentor: 
(1) Generator: takes a given image, attributes, and target segmentation as inputs to generate a target image
(2) Discriminator: pushes the generated images towards target domain distribution and utilizes an auxiliary attribute classifier with SG GAN generating images with multiple attributes
(3) Segmentor: imposes semantic information on the generation process. This framework is trained using a large data‑set of face images with attribute-level labels.
Key benefits
- Generates more realistic results with better image quality (sharper and clearer details) after image translation
- Enables additional morphing features (face attributes reallocation, changing face shape, making the person gradually smile) which are provided in the translation process
- Generates facial semantic segmentations directly from given input face images that have traditionally been achieved by converting the results from pre-trained face landmark detector
Commercial Applications
- Entertainment: social media apps, modeling and fashion e-commerce sites, and video editing
- Automatic criminal sketch and forensic tools for applications in security and law s.a. human tracking, missing children verification and recognition.
Patent Information:
For Information, Contact:
Barbara Finer
Northeastern University
Yun Fu
Songyao Jiang