New Apple study challenges whether AI models truly think through problems

neurons

Earlier this month, Apple researchers published a study indicating that simulated reasoning (SR) models, including OpenAI’s o1 and o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking, generate responses that align with pattern-matching from their training data when tackling new problems that demand systematic reasoning.

Benj Edwards for Ars Technica:

The researchers found similar results to a recent study by the United States of America Mathematical Olympiad (USAMO) in April, showing that these same models achieved low scores on novel mathematical proofs.

The new study, titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity,” comes from a team at Apple…

The researchers examined what they call “large reasoning models” (LRMs), which attempt to simulate a logical reasoning process by producing a deliberative text output sometimes called “chain-of-thought reasoning” that ostensibly assists with solving problems in a step-by-step fashion.

To do that, they pitted the AI models against four classic puzzles — Tower of Hanoi (moving disks between pegs), checkers jumping (eliminating pieces), river crossing (transporting items with constraints), and blocks world (stacking blocks) — scaling them from trivially easy (like one-disk Hanoi) to extremely complex (20-disk Hanoi requiring over a million moves).

“Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy,” the researchers write. In other words, today’s tests only care if the model gets the right answer to math or coding problems that may already be in its training data—they don’t examine whether the model actually reasoned its way to that answer or simply pattern-matched from examples it had seen before.


MacDailyNews Take: Back in the day, our math teachers always required that we show our work, not just provide the correct answer.



Please help support MacDailyNews — and enjoy subscriber-only articles, comments, chat, and more — by subscribing to our Substack: macdailynews.substack.com. Thank you!

Support MacDailyNews at no extra cost to you by using this link to shop at Amazon.

2 Comments

  1. First, today there is no such thing as a functional artificial intelligence. Period. No system has passed the singularity yet.

    The reality is all the LLMs are just extremely well trained Pavlov’s Dogs. After sufficient training they can respond to inputs based upon that training, and to some users appear to be intelligent. None of them have the ability to come up with completely new ideas. All responses are limited to the extent of the training.

    Someday there will be a true AI. Will it be in five years? Will it be in 50 years? Will it be in 100 years? No one knows. Anyone who says otherwise is either delusional or lying to you.

    I suspect this post will greatly annoy many readers of MacDailyNews. However, reality is what reality is.

    13

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.