Study Done By Apple AI Scientists Proves LLMs Have No Ability to Reason – Slashdot
Slashdot reader Rick Schumann shared this report from the blog AppleInsider:
A new paper from Apple’s artificial intelligence scientists has found that engines based on large language models, such as those from Meta and OpenAI, still lack basic reasoning skills.
The group has proposed a new benchmark, GSM-Symbolic, to help others measure the reasoning capabilities of various large language models (LLMs). Their initial testing reveals that slight changes in the wording of queries can result in significantly different answers, undermining the reliability of the models. The group investigated the “fragility” of mathematical reasoning by adding contextual information to their queries that a human could understand, but which should not affect the fundamental mathematics of the solution. This resulted in varying answers, which shouldn’t happen…
The study found that adding even a single sentence that appears to offer relevant information to a given math question can reduce the accuracy of the final answer by up to 65 percent. “There is just no way you can build reliable agents on this foundation, where changing a word or two in irrelevant ways or adding a few bit of irrelevant info can give you a different answer,” the study concluded… “We found no evidence of formal reasoning in language models,” the new study concluded. The behavior of LLMS “is better explained by sophisticated pattern matching” which the study found to be “so fragile, in fact, that [simply] changing names can alter results.”