Driven by computer vision and deep learning techniques, a new wave of imaging attacks has recently emerged, which allows anyone to create highly realistic "fake" videos easily. These false videos are known as Deep Fakes. While highly entertaining at times, DeepFakes can be used to perturb society, and some would argue that the pre-shock has already begun. A rogue DeepFake, which goes viral, can spread misinformation across the internet like wildfire.
Because DeepFakes contains a unique combination of realism and novelty, they are more difficult to detect on social networks as compared to traditional "bad" content like pornography and copyrighted movies. Video hashing might work for finding duplicates or copyright-infringing content, but not good enough for DeepFakes. To fight face-manipulating DeepFake AI, one needs an even stronger AI.
As today's DeepFakes are based on Deep Learning and Deep Learning tools like TensorFlow and PyTorch are accessible to anybody with a modern GPU, such face manipulation tools are particularly disruptive. The democratization of Artificial Intelligence has brought us near-infinite use-cases. From the DeepDream phenomenon of 2015 to the Deep Style Transfer Art apps of 2016, 2018 is the year of the DeepFake. Today's computer vision technology allows a hobbyist to create a Deep Fake video of just about any person they want performing any action they want, in a matter of hours, using commodity computer hardware.
What is Deep Fake?
A deep fake is a video generated from a modern computer vision puppeteering face-swap algorithm which can be used to generate a video of target person X performing target action A, usually given a video of another person Y performing action A. The underlying system learns two face models, one of target person X, and of for person Y, the person in the original video. It then determines a mapping between the two faces, which can be used to create the resulting "fake" video. Techniques for facial reenactment have been pioneered by movie studios for driving character animations from real actors' faces, but these techniques are now emerging as deep learning-based software packages, letting the deep convolutional neural networks do most of the work during model training.
Consider the following collage of faces. Can you guess which ones are real and which ones are DeepFakes?
It is not so easy to tell which image is modified and which one is unadulterated. And if you do a little bit of searching for DeepFakes (warning, unless you are careful, you will encounter lots of pornographic content), you notice that the faces in those videos look very realistic.
How are Deep Fakes made?
While there are conceptually many different ways to make Deep Fakes, today we'll focus on two key underlying techniques: face detection from videos, and deep learning for creating frame alignments between source face X and target face Y.
A lot of this research started with the Face2face work presented at CVPR 2016. This paper was a modernization of the group's earlier SIGGRAPH paper and focused a lot more on the computer vision details. At this time the tools were good enough to create SIGGRAPH-quality videos, but it took a lot of work to put together a facial reenactment rig. In addition, the underlying algorithms did not use any deep learning, so a lot of domain-knowledge (i.e., face modeling expertise) went into making these algorithms work robustly. The TUM/Stanford guys filed their Real-time facial reenactment patent in 2016 [4], and have more recently worked on FaceForensics to detect such manipulated imagery.
In addition to the Face2face guys (who have now a handful of similarly themed papers), it is interesting to note that a lot of key early ideas in face puppeteering were pioneered by Ira Kemelmacher-Shlizerman who is now a computer vision and graphics assistant professor at University of Washington. She worked on early face puppeteering technology for the 2010 paper Being John Malkovich, continued with the Photobios work, and later founded Dreambit (based on a SIGGRAPH 2016 paper), which was acquired by Facebook. :-)
The origin of Ira's Dreambit system is the Transfiguring Portraits SIGGRAPH 2016 paper[6]. What's important to note is that this is 2016, and we're starting to see some use of Deep Learning. The transfiguring portraits work used a big mix of features, using some CNN features computed from early Caffe networks. It is not an entirely easy-to-use system at this point, but good enough to make SIGGRAPH videos, take one minute to generate other crisp outputs, and cool enough for Facebook to acquire.
Fighting against DeepFakes
There are now published algorithms that try to battle DeepFakes by determining if faces/videos are fake or not. FaceForensics introduces a large DeepFake dataset based on their earlier Face2face work. This dataset contains both real and "fake" Face to face output videos. More importantly, the new dataset is big enough to train a deep learning system to determine if an image is counterfeit. Also, they can do both 1.) determine which pixels have likely been manipulated, and 2.) perform a deep cleanup stage to make even better DeepFakes.
Another fake detection approach, this time from a Berkeley AI Research group called Image Splice Detection, focuses on detecting where an image was spliced to create a false composite image. This allows them to determine which part of the model was likely "photoshopped," and the technique is not specific to faces. And because this is a 2018 paper, it should not be a surprise that this kind of work is all based on deep learning techniques.
Concluding Remarks
The new DeepFake tools were pioneered in the early 2010s and were producing SIGGRAPH-quality results by 2015. It was only a matter of years until DeepFake generators became publicly available. 2018's DeepFake generators, being written on top of open-source Deep Learning libraries, are much easier to use than the research systems from only a few years back. Today, just about any hobbyist with minimal computer programming knowledge and a GPU can build their DeepFakes.
Just as Deep Fakes are getting better, Generative Adversarial Networks are showing more promise for photorealistic image generation. It is likely that we will soon see lots of exciting new work on both the generative side (deep fake generation) and the discriminative side (deep fake detection and image forensics), which incorporate more and more ideas from the machine learning community.