AI-generated content detection tools put to the test

OpenAI’s Generative Pre-Trained Transformer (ChatGPT) has been the talk of the town since its launch in November 2022. The AI chatbot had more than a million users in just 4 days and surpassed 100 million active users in just two months — a milestone that took TikTok more than 9 months to reach.

However, its ability to understand the meaning and context of text inputs and provide almost human-like responses has caused consternation in a number of areas and industries in which original human-generated content is valued. This includes education, content marketing, publishing, journalism and law.

Their biggest questions are “How do we distinguish between AI and human-written text?” and “How can we detect AI-generated content?”

But first, how does ChatGPT work?

To differentiate between AI and human-written text, one must delve deep into how platforms like ChatGPT are built.

ChatGPT works by using a deep learning algorithm called a transformer, which is a type of neural network architecture that is particularly effective for natural language processing (NLP) tasks. The model has been trained on a massive corpus of text data from the internet, including books, articles and websites.

This training data has been pre-processed and fed into ChatGPT in a way that allows it to learn patterns and relationships between words and phrases.

When a user inputs a question or statement into ChatGPT, the model processes the text and generates a response based on its training data and its understanding of the context and meaning of the input.

Five sample characteristics

ChatGPT uses a technique called “unsupervised learning,” which means it does not require explicit instructions or labels to learn how to generate responses. As a language model, ChatGPT can perform a wide variety of NLP tasks, including text completion, question answering, language translation and even text generation.

Its ability to generate coherent and realistic responses to complex prompts has made it a valuable tool for a wide range of applications, including chatbots, virtual assistants and language-based games and services.

Needless to say, it’s still extremely hard to detect AI-generated content. One way to go about this manually is to examine five key characteristics of the sample:

Consistency: AI-generated text is typically consistent in its style, tone, and vocabulary, whereas human-written text may exhibit more variation and nuances.
Coherence: The content can sometimes lacks coherence, particularly when responding to complex or nuanced prompts. Human-written text, on the other hand, is typically more coherent and follows a logical structure.
Originality: AI-generated text may sometimes contain repetitive or formulaic phrases or patterns, while human-written text is more likely to be original and creative.
Errors: AI-generated content is more prone to error than human-written text, particularly in areas where the model has not been trained extensively.
Context: The platform may sometimes struggle to understand the context of a given prompt, leading to inappropriate or irrelevant responses, while human-written text is more likely to be tailored to the specific context and audience.

Why not automate it? Ever since ChatGPT hit the news, many software companies — including OpenAI — have launched authentication tools that help users identify text written by AI software. In this article, we examine some of the top automated AI-content detection tools and put them to the test.

A Snapshot of Content At Scale’s AI Detector suggesting that some phrases from Shakespeare’s A Midsummer Nights Dream were generated by AI.

In a recent blog post, OpenAI shared a link to a new classifier tool that can differentiate between text created by humans and that generated by various AI systems. However, they acknowledge that the tool is not entirely reliable at this stage.

While it may be impossible to detect all AI-written text, the researchers believe that good classifiers can identify indicators that suggest AI generation. The tool may be useful in cases of academic dishonesty and when AI chatbots are posing as humans, according to the post.

The new classifier correctly identified 26% of AI-written English texts, but 9% of the time, also falsely identified human-written text as likely generated by AI tools. OpenAI noted that the reliability of the tool generally increases with the length of the input text. It is unreliable on texts shorter than 1,000 characters and may mistakenly identify some human-written texts as AI-written.

The tool is recommended for use only with English text and it is not suitable for checking code. OpenAI cautions that the tool should not be the primary decision-making tool, but instead used in conjunction with other methods to determine the source of a piece of text. Each document is labeled as either “very unlikely,” “unlikely” or “unclear” if it is AI-generated.

In all honesty, we did not have much hope for the platform that considered Macbeth to be “AI generated”, but the results were on point. To start off, we ran William Shakespeare’s The Tempest through the platform and the classifier considered it to be “very unlikely” AI generated, which is essentially human generated.

In the second run, we provided the platform with an article written by ChatGPT and it accurately pointed out that the test was “likely” to be AI-generated.

In the last test, we tried to trick the platform by using two AI tools simultaneously. ChatGPT to write it and Quillbot to paraphrase it. Again, the results were somewhat accurate. This time, the classifier considered the text to be “possibly” AI-generated, which is fine as there was human intervention.

Content at scale

Founded in 2021, content automation platform Content at Scale launched the “AI Detector”, claiming that it “works at a deeper level than a generic AI classifier and detects robotic sounding content.”

What’s interesting is the way the company positions this tool. Unlike its counterparts, the freely available AI Detector is positioned as a first step toward buying Content at Scale’s flagship content generator, one that claims to produce “undetectable” AI-generated content by tapping into multiple layers with three AI components: NLP, semantic analysis algorithms and SERP parsing capabilities. In their words, “It’s so human-like that it bypasses AI content detection!”

For what it’s worth, this reporter tried the AI Detector and found the results unsatisfactory. We first tested it with Shakespeare’s A Midsummer Night’s Dream (which as we all know is human-written) and the platform got it right (for most part). Oddly enough, it pointed out several instances that could be AI generated, which in this case was not.

For the second test, we provided the platform with an article written by ChatGPT, and it failed. Even though there was no human intervention in writing this article, the platform gave it an 83% human content score.

Although there was no need to run it through a third test, we paraphrased the same article using another AI-powered software (QuillBot) and gave the AI detector one more shot; the results were no different. On the positive side, the human content score declined to 75%, hinting at an AI intervention.

Copyleaks AI

Stamford, Connecticut-based anti-plagiarism software company Copyleaks recently expanded its product portfolio with an enterprise solution designed to detect whether digital content was written by a human or generated by AI, including ChatGPT.

The platform claims an accuracy rate of 99.12%, along with enterprise-level LMS and API integration capabilities that enable educational institutions or businesses to add the AI Content Detector to their native platforms. Multi-language detection is also a key feature, with support for English, German, Spanish, French and Portuguese. The company also offers an AI Content Detector Chrome extension to help users verify content across the internet, including social media, news articles and consumer reviews.

Among our test candidates, the platform showed the highest accuracy. For the human content test, it accurately detected that text was written by a human. Similarly, the platform showed 99.7% probably for AI-generated content when we provided it with text from ChatGPT. Even in the last test that featured a paraphrase AI-generated text, the platform indicated that content had a 99.9% chance of being written by AI.

As technology advances, AI-content assisted production is bound to go mainstream — and with that, AI content detectors will improve.

The platforms we tested were just a few of the many that exist in the market. The list of detector entries includes Writer.com, Corrector and Originality.ai.

Do give them a shot!

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.