In March 2023, GPT-4 boasted an impressive prime number problem-solving accuracy rate of 97.6%. However, in its recent update in June 2023, this rate drastically plummeted to a mere 2.4%. Users have been voicing concerns about OpenAI’s chatbots’ declining performance, and a recent report from Stanford University and the University of California, Berkeley, confirms these worries.
The report analyzed GPT3.5 and GPT-4’s responses in various areas, including math, problem-solving, and code generation, over four months from March to June 2023. During this period, GPT-4’s prime number problem-solving accuracy suffered a significant decline from 97.6% to 2.4%. On the other hand, ChatGPT 3.5 showed improvement in its arithmetical skills, increasing its accuracy from 7.4% to 86.8% in prime-number problem-solving.
Both GPT3.5 and GPT-4 were renowned for their ability to generate code, but their performance in this area also deteriorated over time. In March, GPT-4 could provide accurate, ready-to-run scripts over 50% of the time, while ChatGPT 3.5 managed this around 22%. However, by June, their accuracy rates in generating ready-to-run scripts plummeted to 10% and 2%, respectively.
The exact cause behind these changes in the chatbots’ responses remains unclear due to the opacity of these models. Researcher James Zhu suggests that improvements in some domains may inadvertently lead to negative effects on other tasks.
OpenAI responded to the report, denying claims of intentionally making GPT-4 less intelligent. According to VP of Product Peter Welinder, each new version is meant to be smarter than its predecessor. He attributes the issues to the heavy usage of the models, which brings attention to previously unnoticed problems.
As the debate continues, the performance of these chatbots remains a topic of concern for users and researchers alike, urging OpenAI to address and resolve the reported decline in accuracy.