Home / Papers / DEEPFAKER: A Unified Evaluation Platform for Facial Deepfake and Detection...

DEEPFAKER: A Unified Evaluation Platform for Facial Deepfake and Detection Models

2 Citations2023
Li Wang, Xiangtao Meng, Dan Li
ACM Transactions on Privacy and Security

A large-scale empirical study of facial deepfake/detection models and a set of key findings are drawn that the detection methods have poor generalization on samples generated by different deepfake methods, and there is no significant correlation between anti-detection ability and visual quality of deepfake samples.

Abstract

Deepfake data contains realistically manipulated faces—its abuses pose a huge threat to the security and privacy-critical applications. Intensive research from academia and industry has produced many deepfake/detection models, leading to a constant race of attack and defense. However, due to the lack of a unified evaluation platform, many critical questions on this subject remain largely unexplored. How is the anti-detection ability of the existing deepfake models? How generalizable are existing detection models against different deepfake samples? How effective are the detection APIs provided by the cloud-based vendors? How evasive and transferable are adversarial deepfakes in the lab and real-world environment? How do various factors impact the performance of deepfake and detection models? To bridge the gap, we design and implement DEEPFAKER1 a unified and comprehensive deepfake detection evaluation platform. Specifically, DEEPFAKER has integrated 10 state-of-the-art deepfake methods and 9 representative detection methods, while providing a user-friendly interface and modular design that allows for easy integration of new methods. Leveraging DEEPFAKER, we conduct a large-scale empirical study of facial deepfake/detection models and draw a set of key findings: (i) the detection methods have poor generalization on samples generated by different deepfake methods; (ii) there is no significant correlation between anti-detection ability and visual quality of deepfake samples; (iii) the current detection APIs have poor detection performance and adversarial deepfakes can achieve about 70% attack success rate on all cloud-based vendors, calling for an urgent need to deploy effective and robust detection APIs; (iv) the detection methods in the lab are more robust against transfer attacks than the detection APIs in the real-world environment; and (v) deepfake videos may not always be more difficult to detect after video compression. We envision that DEEPFAKER will benefit future research on facial deepfake and detection.