Westlake University Team Develops Fast-DetectGPT, Capable of Identifying AI-Generated Text
In a recent study, Professor Zhang Yue and his team at Westlake University explored the fundamental differences between machine-generated and human-written text, achieving “fast, accurate, robust, and low-cost” results, clearing the way for the practical application of machine-generated text detection technology.
Specifically, they proposed a new hypothesis to detect machine-generated text, thereby creating a text detection method called Fast-DetectGPT. They believe that since large language models use empirical risk minimization to learn human collective writing behavior, they exhibit obvious statistical characteristics. Human creativity, influenced by cognition and intrinsic causality, has distinct individual features. This leads to clear differences in word choice between humans and machines in a given context, while differences between machines are not so apparent. By leveraging these characteristics, they can use a set of models and methods to detect text content generated by different source models. At the same time, they used a smaller pre-trained language model with less than 10 billion parameters to examine text content generated by large language models such as ChatGPT and GPT-4.
That is to say, without the need for training, Fast-DetectGPT can directly use open-source small language models to detect text content generated by various large language models. Based on DetectGPT, the detection speed can be increased by 340 times, and the detection accuracy can be improved by 75%. In detecting text generated by ChatGPT and GPT-4, it can even exceed the accuracy of commercial systems like GPTZero.
Researchers say that the Fast-DetectGPT algorithm can be easily used in various pre-trained language models, and it has good applicability for different countries’ languages and content.
In the future, Fast-DetectGPT can be used on social platforms to distinguish fake news; it can also be used on shopping platforms to suppress fake product reviews; and it can be used in schools or research institutions to identify machine-generated articles, etc.
Through this, the potential harm caused by the extensive use of large language models can be mitigated, helping to build trustworthy AI systems. “We are also considering deploying Fast-DetectGPT to Internet services to provide a wide range of real-time detection services,” the researchers said.
SEE ALSO: Li Auto Recalls Staff Amid Layoffs Impacting R&D and Testing