What is Reasoning AI?
Early 2025 we have been treated with the introduction of promising new AI models, like DeepSeek’s R1 and OpenAi’s o3. Models that are supposed to be different from models we’ve seen before, namely models that can actually reason about a question or a problem. However, Reasoning AI is not something new, it was something that was pursued from the early beginnings of AI. This post describes how it all started, and how it’s going.

Symbolic AI
The first AI application ever was the Logic Theorist, introduced in 1956. It was designed by Allen Newell and Herbert A. Simon, in an attempt to mimic human reasoning. Newell and Simon didn’t know how to write a computer program, as they were social scientists. So they asked computer scientist Cliff Shaw to do the coding job. Fun fact: the three of them first had to design a programming language (IPL, a forerunner of LISP), before Shaw could actually use it to write the program.
The Logic Theorist was successful in that it was the first non-human to solve proofs for dozens of mathematical theorems. It was able to mimic human reasoning by treating the task like searching a tree. The idea was to start at the root with an initial hypothesis and to navigate subsequent branches by deducing, or logically inferring statements, all the way up before getting to the theorem it needed to find.
Also, in dealing with a potential combinatorial explosion of possible branches in the search tree, and given the scarcity of computer resources at the time that prevented brute-force techniques, it had to apply heuristics to efficiently find a likely pathway to the solution, which was also a novelty.
Logical inference was only one of several approaches for automated reasoning in the new field of artificial intelligence, for instance:
Approach | Goal | Example |
---|---|---|
Logical inference | Reasoning with definitions and facts | Logic Theorist, General Problem Solver, PROLOG, Datalog |
Probabilistic inference | Reasoning under uncertainty | Bayesian networks |
Rule-based inference | Reasoning with conditions (production rules, e.g., "if-then-else") | Expert systems |
Ontological inference | Reasoning with knowledge graphs | RDF, OWL, SWRL |
Mathematical inference | Reasoning with constraints | Linear programming |
What these approaches have in common is that inference is performed by a computer program. Like in the case of Logic Theorist, they lead up to software programmed by humans. To infer knowledge or to reason, all these programs manipulate symbolic representations that are both human-readable and machine-readable—hence Symbolic AI. For many years, Symbolic AI was the main scientific paradigm in the field of artificial intelligence.
Connectionist AI
An alternative paradigm is Connectionist AI. Here, human cognitive abilities are mimicked using artificial neural networks (ANN). Like the human brain, an ANN consists of many interconnected neurons—hence the name Connectionist AI.
How Do Neurons Work in ANNs?
You can think of these neurons as tiny processing units that work together to take signals from the outside and to alter and propagate signals through the whole network, all the way up to deliver some kind of output. But instead of electrochemical signals in a biological neural network, an artificial neural network processes numerical data.
Early Successes
It turned out these ANNs were pretty good at pattern matching. For instance, some early applications were already able to take pixelated images of handwriting as input, processing one character at a time, and to feed the numerical values of each pixel into the input layer of the network. Then, after numerous calculations using the signals and the weights, the network could finally “predict” which letter it had “seen.” After repeating this for every single character, a bitmap image of handwritten text could be transformed into an ASCII file.
Training ANNs
So how do you create such an ANN, or “model” as we say nowadays? You could try to figure out the exact weights that are needed and manually enter them for each individual cell, or “parameter.” Instead, these models are trained using so-called “supervised learning” algorithms and a lot of training data.
Training data for supervised learning consists of input-output pairs. Following the handwriting example: different bitmap images for many different ways the letter “r” is written, that are expected to trigger the output parameter that represents ASCII value 114.
The learning algorithm executes many iterations. Each iteration exposes the model to all of these bitmaps, leading to a prediction from the model. Then predictions are compared to expectations, after which the weights of the parameters are modified to compensate for the difference.
Generalization and Early Limitations
The new connectionist models were promising because they were much better at pattern matching than symbolic models. These networks could return the correct output even for inputs they had never “seen” before, meaning they were better at generalisation.
However, in the early days of ANNs—the 1990s—overall performance was disappointing. Although deeper networks were theorised to outperform shallower ones, training deep networks required a lot of processing power and large datasets, which were not widely available at the time.
Deep Learning
It wasn’t until 2012 when technology advanced to the point where it became economically feasible for universities to train large models. GPUs and big data were both sophisticated and cheap enough to train models with dozens to even hundreds of layers deep—hence the term deep learning.
Large Language Models (LLMs)
More than ten years later, we have entered the era of Large Language Models (LLMs). Instead of predicting an ASCII value for a bitmap image of a character, LLMs are trained to predict the next word in an unfinished sentence. Very large models—the size of many billions of parameters—are even able to make these predictions in real time, which enables us to use them via a chat interface, like ChatGPT, Claude, and Perplexity.
We have seen LLMs become smarter by increasing the size of models (i.e., number of parameters) and by increasing the size of training datasets. But does “smarter” mean that these LLMs are really capable of reasoning? Or are they just sophisticated pattern matchers?
Reinforcement Learning
From Pattern Matching to Reasoning
Not too long ago, LLMs struggled with tasks straightforward to humans, such as counting the number of letters “r” in the word “Strawberry.” As we’ve seen, LLMs are trained to predict the next word in an unfinished sentence (the “what”), but they are not inherently capable of breaking a word into letters (the “how”).
Reinforcement Learning in Action
Reinforcement learning is another way to train a model, not only on the “what” but also on the “how.” For example, AlphaGo by Google DeepMind taught itself to play the game “Go” by playing millions of games against itself. Through this process, it learned all the intermediate steps needed to go from any game situation to a solution, in this case, a victory.
Reinforcement Learning from Human Feedback (RLHF)
RLHF is a form of reinforcement learning where a model is trained with feedback from humans. A person provides a rating in terms of how useful or truthful an output is. This feedback can then be used to adjust the weight factors.
In addition, models can be trained using a complementary technique called “Chain of Thought (CoT) prompting.” Here, the model is encouraged to “take its time,” break down the problem, and use step-by-step reasoning to arrive at the answer. The model is then evaluated not only on the “what” (usefulness or truthfulness) but also on how it arrived at the answer.
Recent Examples
A practical example is R1 from DeepSeek, a model that shook the world in early 2025. DeepSeek R1 is a case in which both “large-scale reinforcement learning” and “Chain of Thought prompting” were applied. Specific details are lacking, but it is very likely that the latest reasoning models, such as o1 and o3 (OpenAI), Gemini 2.0 (Google DeepMind), and Qwen with Questions (QwQ, Alibaba Cloud), were also developed using RLHF and CoT prompting.
Inference Scaling
Reinforcement learning occurs during training or fine-tuning. The question now is: how does a model reason during inference?
Large language models like OpenAI’s GPT-4 are now (February 2025) perfectly capable of counting the number of “r” letters in the word “Strawberry,” as well as letters in any other word, presumably. Moreover, you will get the answer almost instantly from GPT-4, with hardly any delay. This is because GPT-4’s enormous number of parameters allows it to produce the answer in just a single pass through its layers.
OpenAI’s GPT-4o mini takes a bit more time than GPT-4, approximately 3 seconds. This is because o1, through inference scaling, is given time to “think” about the correct answer. On the other side of the equation, GPT-4o mini uses far fewer resources than GPT-4. Again, specific details are lacking, but a Microsoft paper estimated the number of parameters to be 8B (8 billion) for GPT4o mini versus 1.76T (1.76 trillion) for GPT-4. For comparison, the costs are $0.60 per 1M output tokens for GPT4o mini versus $60.00 per 1M tokens for GPT-4 (as of February 2025).
Conclusion
Reasoning AI is something that was already pursued in the early beginnings of artificial intelligence. In the era of Symbolic AI, we tried to “program” our way to smart, reasoning models. Later, we recognized the potential of Connectionist AI with its pattern-matching capabilities and realized that training models was more effective than manually programming them. The availability of powerful commodity GPUs and larget training data sets allowed us to scale deep learning.
The real breakthrough for Reasoning AI came with techniques such as reinforcement learning, Chain of Thought prompting, and inference scaling—all of which have proven useful in developing the next generation of reasoning models.