Study finds alarming rate of decline in ChatGPT performance
A new study by researchers from the University of California, Berkeley and Stanford University has found that ChatGPT, developed by OpenAI, is experiencing a significant decline in performance, and may even be getting worse.
Stay up-to-date with the latest cryptocurrency market trends and blockchain news and information by following CFTime page now.
Researchers analyzed different versions of ChatGPT and developed strict benchmarks to evaluate the model's ability in mathematical, coding, and visual reasoning tasks. The results showed a stunning decline in ChatGPT's performance.
Tests showed that in the mathematical challenge of determining prime numbers, ChatGPT's accuracy dropped from 97.6% in March to 2.4% in June. The decline was especially noticeable in the software coding abilities of the chatbot.
In addition, researchers evaluated reasoning abilities using visual prompts from the Abstract Reasoning Corpus (ARC) dataset, and observed a significant decline. The study also found that the percentage of direct executable generation for GPT-4 dropped from 52% in March to 10.0% in June. These results were obtained using a pure version of the model, meaning no code interpreter plug-ins were involved.
Researchers hypothesize that this may be one of the side effects of the updates OpenAI made when developing ChatGPT, such as changes introduced to prevent ChatGPT from answering dangerous questions. However, this safety alignment may reduce ChatGPT's usefulness for other tasks. The model now tends to give verbose, indirect answers rather than clear answers.
AI expert Santiago Valderrama suggested a possibility on Twitter that a "cheaper, faster" model mix may have replaced the original ChatGPT architecture. He speculated that there are rumors that OpenAI is using several smaller, more specialized GPT-4 models that act like one big model but with lower operating costs. He believes this may speed up response times for users, but may lower the model's capabilities.
NVIDIA Senior AI Scientist Dr. Jm Fan shared his insights on Twitter, stating that more security typically comes at the cost of less practicality. He believes understanding these results is related to how OpenAI fine-tunes its models.
He speculated that from March to June, OpenAI spent a lot of time fine-tuning and did not have time to fully restore other important abilities. Fan believes other factors may also be at play, including efforts to cut costs, introducing warnings and disclaimers that may "simplify" the model, and a lack of extensive feedback from the community.
AI experts suggest that ChatGPT users may need to lower their expectations. The crazy idea generator that many people initially encountered with the machine looks more subdued - perhaps not as impressive.
Even though ChatGPT's performance has declined, it is still a very powerful model with many impressive capabilities. Additionally, this decline may only be a temporary phenomenon, as OpenAI may make further improvements. Therefore, ChatGPT is still a very valuable tool that can be used for various language and natural language processing tasks.
How can we prevent further deterioration? Some advocates suggest using development models like Meta's LLaMA for community debugging. Continuous benchmark testing and early regression detection are critical.
In addition, ChatGPT's decline also reminds us to balance the security and practicality of AI models. These models need to maintain their performance and functionality while remaining secure. This requires careful fine-tuning and optimization of the models to ensure they can meet a variety of needs.