An artificial intelligence model that identifies errors in code generated by OpenAI's ChatGPT: CriticGPT
CriticGPT analyzes code and facilitates the detection of subtle errors by pointing out potential issues that may be difficult to notice.
Introducing CriticGPT: OpenAI’s New AI Model Detecting Code Errors
OpenAI researchers have introduced CriticGPT, a new artificial intelligence model designed to detect errors in code generated by ChatGPT. Based on the GPT-4 family, CriticGPT analyzes code, flagging potential issues to facilitate the detection of subtle errors. Researchers trained CriticGPT on a dataset comprising intentionally injected errors, teaching it to recognize and annotate various coding mistakes.
In cases involving naturally occurring errors, CriticGPT’s critiques were preferred by explainers over human critiques in 63% of instances. Teams using CriticGPT and humans together wrote more comprehensive critiques than humans alone, reducing confabulation rates compared to AI-only critiques.
Additionally, researchers developed a new technique called Force Sampling Beam Search (FSBS) to assist CriticGPT in generating detailed code reviews. This method allows adjusting CriticGPT’s thoroughness in searching for issues and controlling how often it fabricates nonexistent problems.
However, CriticGPT has limitations—it was trained on relatively short responses from ChatGPT, potentially limiting its readiness to evaluate longer, more complex tasks in future AI systems. While reducing confabulations, CriticGPT does not eliminate them entirely, and human trainers may still make labeling errors based on incorrect outputs.
The research team acknowledges CriticGPT as the most effective method for identifying specific, localized errors within code.
Page Contents
Toggle