As of July 20, OpenAI has quietly pulled the plug on their A.I. detection software, AI Classifier, citing its “low rate of accuracy.”If this is the first time you’re hearing of this, don’t be surprised—OpenAI buried the news so deep underground that it took a week for anyone to even notice it was there.
The AI Classifier’s end came not with a bang, but a whimper. Rather than announcing the change in a statement, OpenAI took down AI Classifier’s landing page, which now simply displays an error notification, then edited in two sentences informing users of the tool’s discontinuation at the top of the January blog post that first launched the product. In the updated post, OpenAI said the company is “researching more effective provenance techniques.”
OpenAI declined to comment beyond the January blog post.
AI Classifier’s inaccuracy is not exactly a shock, given that its initial announcement spent more words detailing its caveats and limitations than its strengths. The detection tool could only correctly identify 26 percent of A.I.-written text and, more alarmingly, spat out false positives (assessments that judged human-written text to be plagiarized from A.I.) 9 percent of the time, according to OpenAI.
For educators, unreliable A.I. plagiarism detectors bring far more dangers than benefits
The shuttering of AI Classifier is especially concerning for the education sector. The creation of ChatGPT sent academic institutions scramblingas it became evident that students were growing increasingly reliant on generative A.I. for nearly everything. A.I. detection software was originally seen as one of the few lighthouses available to guide teachers through the stormy sea of A.I.-generated plagiarism, but it looks like that light might be flickering out.
Marc Watkins, a professor at the University of Mississippi who specializes in A.I. in education, views AI Classifier’s shutdown as emblematic of a larger issue.
“This is an acknowledgement that [A.I. detection software] doesn’t really work across the board,” he told Observer, noting the University of Pittsburgh’s recent discontinued use of A.I. detection tool Turnitin for accuracy reasons.
“Given that the tool was released in January then shut down by July, it seems like a pretty clear indication that this does not work,” Watkins said of AI Classifier.
Watkins isn’t the only person losing faith in A.I. detection technology; in a Twitter poll posted July 25, only 15.3 percent of 667 respondents said they believe it’s possible for anyone to make a consistently accurate detector.
After all, if OpenAI, the company behind ChatGPT, is unable to determine what text is generated by its own algorithm, it’s hard to imagine other detection tools like GPT-0 or Originality.AI faring much better.
The consequences of inaccurate A.I. detection run deeper than merely missing a few GPT-written papers. As a University of California student discovered firsthand this spring, A.I. plagiarism detectors can be trigger-happy with their accusations and end up prompting teachers to wrongfully report students for essays written entirely without outside influence.
These false positives aren’t random, either. For instance, “non-English speaking students score lower on perplexity, one of the main frameworks A.I. detectors use to determine A.I. generated text,” Watkins said, “which means that non-native speakers will be misflagged at much higher rates.”
While the end of AI Classifier has very little tangible impact (there are thousands more online), it symbolizes something far greater: the widespread ineptitude of the current technological foils to ChatGPT.
Watkins told Observer he does not “advocate or recommend any of the detection tools” presently available, and with OpenAI’s embarrassed retreat from the front lines of classification software, it’s not difficult to see why.