The LLM Reasoning Debate Heats Up
Three recent papers examine the robustness of reasoning and problem-solving in large language models
Melanie Mitchell Oct 21, 2024
One of the fieriest debates in AI these days is whether or not large language models can reason.
In May 2024, OpenAI released GPT-4o (omni), which, they wrote, “can reason across audio, vision, and text in real time.” And last month they released the GPT-o1 model, which they claim performs “complex reasoning”, and which achieves record accuracy on many “reasoning-heavy” benchmarks.
But others have questioned the extent to which LLMs (or even enhanced models such as GPT-4o and o1) solve problems by reasoning abstractly, or whether their success is due, at least in part, to matching reasoning patterns memorized from their training data, which limits their ability to solve problems that differ too much from what has been seen in training.