Introduced by Yu-Chieh Lee, Sierra Park, Leo Li, Chris Chin
DeepMasque takes in any portrait and outputs a state of the art alpha matte, an image of black (background) and white (foreground/person), which can be used to place the subject on a new background. DeepMasque automates the process of separating the subject from the background in an image, having the potential to replace an entire sector of tasks in Hollywood, VR, and 3D- printing, all of which still use the green screen techniques.
In its current form, our project is useful for amateur photographers and filmmakers looking to change the background of an image, as we kept the whole runtime ~5 minutes, which is shorter than how long manually segmenting an image would take. Our project could be extended to segment video files, which would go a long way in automating the green screen technique. We are also currently working on creating a 3D model from the segmented image.
Our algorithm involves three main steps:
- Our preprocessing stage captures the positional offsets of all our portrait inputs with respect to a reference image using deep metric learning, a facial featurization technique. We a) identify 49 points which captures distinct facial features for both images b) find the affine transformation between these two sets of points and c) output the affine transforms of the mean mask & positional offsets of the reference.
- We take the 3 channels outputted via preprocessing and the portrait as inputs into a 20 layer encoder-decoder network called Portrait FCN+, which outputs an unrefined alpha matte. We train Portrait FCN+ on Amazon EC2 against the ground truth alpha matte (i.e. true subject area) of the images. We generate a trimap, an alpha matte with an additional region in grey representing unknown, by setting 10 pixels on either side of the subject and background segmentation line as the unknown.
- The trimap is then put into two refinement stages: a) KNN-matting applies K-nearest neighbors to classify the unknown (grey) region and b) ResNet deals with minuscule errors that might have occurred in the PortraitFCN+. The output here is an alpha matte. Our refinement algorithm is much less computationally expensive than the current state of the art refinement procedure, DIM, while maintaining the same accuracy: a 97% IoU. In fact, we show that our refinement algorithm works on a Launchpad setup of 4KB, a minuscule amount compared to an iPhone, which has 64-256 GB.