A wild testbed is built by gathering texts from various human writings and deepfake texts generated by different LLMs to showcase the challenges of deep-fake text detection in a wild testbed and show empirical results on automatic detection meth-ods.
Recent advances in large language models have enabled them to reach a level of text generation comparable to that of humans. These models show powerful capabilities across a wide range of content, including news article writing, story generation, and scientific writing. Such capability further narrows the gap between human-authored and machine-generated texts, highlighting the importance of deepfake text detection to avoid potential risks such as fake news propagation and plagia-rism. However, previous work has been limited in that they testify methods on testbeds of specific domains or certain language models. In practical scenarios, the detector faces texts from various domains or LLMs without knowing their sources. To this end, we build a wild testbed by gathering texts from various human writings and deepfake texts generated by different LLMs. Human annotators are only slightly better than random guessing at identifying machine-generated texts. Empirical results on automatic detection meth-ods further showcase the challenges of deep-fake text detection in a wild testbed. In addition, out-of-distribution poses a greater challenge for a detector to be employed in realistic application scenarios. We release our resources at https://github.com/yafuly/ DeepfakeTextDetect .