Unlocking Deeper AI: The Power of Thinking in LLM Models

Ever wondered how advanced AI models can tackle truly complex problems with a depth of analysis that seems to mimic human thought? The secret lies in a groundbreaking capability known as “thinking.” This fascinating development is designed to unblock key bottlenecks on the path to greater intelligence in AI.

Moving Beyond Fixed Compute

Historically, powerful large language models (LLMs) were designed to respond immediately to requests. This meant they applied a constant amount of computing power at “test time”—the moment you ask a question or give a command—to generate a response. This fixed compute budget restricted how deeply the model could “think” about a problem, limiting its ability to handle extremely hard or challenging tasks. Imagine if your brain only spent a fixed millisecond on every problem, no matter its complexity!

Users, quite naturally, desire a more dynamic application of compute. Simple requests should be quick and cost-effective, while very complex ones should allow the model to deliberate for far longer, perhaps even a thousand or a million times more. This is precisely what motivates this “thinking” capability in advanced AI models. And you can control the thinking budget / tokens in the latest models.

"reasoning": {"effort": "medium"},

Specify low, medium, or high for this parameter, where low favors speed and economical token usage, and high favors more complete reasoning. 

How Thinking in LLM models Works: An Iterative Internal Dialogue

Mechanically, “thinking” introduces an additional “thinking stage.” Before the model commits to its final answer, it can generate additional text internally, creating an iterative loop of computation. This loop allows the model to perform additional test-time compute during this thinking stage. Crucially, this internal loop can potentially run for thousands or even tens of thousands of iterations, providing a proportional increase in computing power before it decides on its final response. Because it’s a loop, the process is dynamic, meaning the model can learn how many iterations to apply based on the problem’s complexity. It’s like the model is having an internal monologue to work through a problem.

How Thinking Models Reason: Learning to Strategise and Self-Correct

The ability for these models to “think” is achieved through reinforcement learning (RL). After initial pre-training, the model undergoes an RL stage where it’s trained on many different tasks, receiving positive or negative rewards based on whether it solves the task correctly. This remarkably general training recipe allows the model to interpret a vague signal of correctness and backpropagate this through its thinking loop to shape how it uses its internal computation and tokens.

Researchers observed some truly remarkable emergent behaviour during this training. For example, in an integer prediction problem, the model was seen using its thinking tokens to:

Pose a hypothesis.

Test out the hypothesis.

Reject its own idea when it found things weren’t working (e.g., stating “this formula doesn’t hold”).

Try an alternative approach.

This capacity for self-correction and iterative refinement was astonishing to the researchers. Beyond just self-correction, the model learns various sophisticated reasoning strategies, including:

Breaking down problems into various components.

Exploring multiple solutions.

Drafting fragments of code and building them modularly.

Performing intermediate calculations.

Using tools.

All these strategies fall under the umbrella of using more test-time compute to deliver a smarter response.

The Impact and Future of Thinking

This “thinking” capability isn’t just a fascinating research concept; it’s actively driving more capable models and accelerating overall AI progress. It synergistically combines with existing paradigms like pre-training (scaling data and model size) and post-training (scaling human feedback quality). This combined investment leads to a multiplicative effect and overall faster model improvement. Empirical evidence clearly shows a trend of increasing reasoning performance tracking very well with increasing test-time compute.

Beyond raw capability, thinking offers developers and users granular control over quality versus cost. While previous models offered a discrete choice of model sizes to balance quality and cost, thinking introduces a continuous “budget,” providing a much more granular slider for how much capability is desired for a given class of tasks. Thinking budgets are now available in certain advanced models, allowing users to fine-tune cost-to-performance ratios and push performance higher for demanding applications.

Looking ahead, the focus is on:

Improving Reasoning: Generally making models even smarter.

Efficiency: Ensuring the thinking process is as efficient as possible, reducing instances of “overthinking” and making it more cost-effective.

Deeper Thinking (Deep Think): This is a very high-budget ‘thinking’ mode built on top of advanced models, designed for extremely hard problems. It leverages much deeper and parallel chains of thought that can integrate to produce stronger solutions. For instance, on the USA Math Olympiad, this Deep Think approach significantly boosts performance, allowing for asynchronous processing, letting the model run for extended periods to arrive at robust solutions.

Open-ended Coding Tasks: The ability for models to spend longer thinking on complex coding problems could enable tasks that previously took months to be completed in minutes.

Pushing Human Understanding: Inspired by figures like mathematician Ramanujan, the ultimate goal is for models to contemplate deeply from a small knowledge base, building up vast knowledge and artefacts to push the frontier of human understanding.

In essence, this “thinking” capability marks a significant stride in AI development, moving beyond immediate, fixed-compute responses to models that can internally reason, self-correct, and strategically explore solutions, much like the human mind. The future of AI is looking increasingly thoughtful!

Here are some links, to learn more on this topic

https://platform.openai.com/docs/guides/reasoning?api-mode=responses

https://openai.com/index/introducing-openai-o1-preview

Leave a comment