Are LLMs intelligent? Can It Extrapolate? And Does It Even Matter?

One of the philosophical questions that we encounter all the time is, "are LLMs intelligent?" Or, one of its variants, "are LLMs just next word predictors?" and "Do they actually understand concepts?"

This debate has been happening for longer than LLMs have been in the limelight, but the debate seems to be intensifying in recent weeks because of the release of Claude Code and the increasing rise of agents.

More and more, the debate is shifting from academic philosophy to practical economics:

  • If AI agents can genuinely reason through novel problems, then we are looking at one future for legal work and humanity.

  • If LLMs are sophisticated pattern-matchers that break down outside their training distribution, then we are looking at a very different one for legal work.

There Is No Ceiling On AI Intelligence?

Last week, Matthew Honnibal (one of the creators of the incredible NLP library, spaCy) published a thoughtful piece arguing that AI capabilities won't plateau. His argument feels optimistic, and it is worth exploring a little because it forms a compelling perspective.

Honnibal's core claim is that modern AI isn't "fancy autocomplete" anymore. Early models like GPT-1 and GPT-2 essentially predicted what text would come next based on patterns in their training data. You could reasonably call that sophisticated pattern matching. But today's "reasoning" models work differently. Today’s LLMs use reinforcement learning to develop generalizable problem-solving strategies, similar to how AlphaZero learned chess. When a model generates a chain of reasoning that leads to a correct answer, those patterns get reinforced. Over time, it learns meta-strategies for breaking down big problems into little steps — strategies that can transfer to new domains.

He gives the example that if you ask an early model to solve "Do Berlin, London and Mumbai together have a greater population than Australia?", it would try to generate text that looks like how people answer such questions. It might get lucky if it had seen similar examples. A reasoning model, however, learns to break this down: fetch the population of each city, sum them, fetch Australia's population, compare. These aren't just memorized steps — they're learned strategies for decomposing problems that can apply across domains.

Honnibal argues this is why performance won't plateau. The completion model (predicting next words) might hit diminishing returns, but reasoning models have two additional levers:

  1. you can let them run longer to do more reasoning, and

  2. you can continue improving them through more reinforcement learning.

There's no obvious data bottleneck because the models can learn from their own successful reasoning chains.

If Honnibal is right, AI will keep getting better at solving novel problems. Legal research, contract analysis, regulatory interpretation — all of it becomes increasingly automatable.

But is he right? Are the models “reasoning”? Or, are they just pattern matching, akin to a “next reasoning step predicter”?

What the Maths Tells Us

On February 6, 2026, a group of prominent mathematicians released what they call the "First Proof" benchmark — and it's designed to test exactly this question — can the AI really reason?

They published ten research-level math problems that have never appeared on the internet. These aren't contest problems or textbook exercises. They're genuine research questions that arose naturally in the mathematicians' own work, questions they have already solved but have not yet published.

This means that the LLMs have not seen these solutions, and there is no possibility of automatic verification in its training data. The problems require proofs that must be evaluated by human experts. This eliminates the reinforcement learning mechanism that Honnibal's argument depends on.

When given one shot, the best publicly available AI systems struggle to answer these questions.

This suggests that are two types of “intelligence” for this era of LLMs:

  • Interpolation: Navigating within the space of existing knowledge. Finding connections, applying known techniques to similar problems, recognizing patterns across domains. This is what today’s AI appears to do exceptionally well.

  • Extrapolation: Genuine, novel reasoning beyond the training distribution. Creating new frameworks, solving problems with no similar examples, reasoning in domains where correctness can't be automatically verified. This is where today’s AI appears to struggle.

For novel research mathematics, there's no automatic verification and no way for the model know if its reasoning steps are productive versus plausible-sounding nonsense. Without a verification mechanism, it seems that the “reasoning” steps break down.

When tested on problems never seen in training data, the evidence suggests LLMs can't reliably reason beyond it — at least not yet.

But, so what? If AI is only pattern matching to known human works, does that really limit its usefulness?

It’s Like The Scarecrow in Munchkinland

The philosophical debate might be asking the wrong question entirely when it comes to legal tech. Having no brain might not matter at all. Even if AI never achieves true extrapolation — even if it remains fundamentally an interpolation engine that is a “next reasoning step predicter” — what percentage of economically valuable legal work actually requires extrapolation?

The work most lawyers do most of the time do not require any extrapolation. They are not inventing novel legal theories. They are not creating unprecedented frameworks. Most of the time, lawyers are applying known approaches to specific situations. Finding the right precedent clause. Spotting the unusual provision. Identifying which standard framework applies to this particular fact pattern.

That type of work is sophisticated interpolation in a high-dimensional data space. And today’s AI might be extraordinarily good at exactly that.

Consider how legal work typically breaks down.

Tier 1: Known techniques on known problems (90% of legal work)

Standard contract review, routine compliance, document comparison, basic research. AI handles this today. Pattern matching works because these problems exist in the training data.

Tier 2: Known techniques on novel problems (10% of legal work)

Applying established frameworks to new facts, creative use of precedent, constructing arguments from existing case law. This is sophisticated interpolation, and AI is getting good enough.

Tier 3: Genuinely novel problems (0-1% of legal work)

Creating new legal theories, identifying unrecognized rights, developing unprecedented approaches. Martin Lipton inventing the poison pill was Tier 3 work. AI probably can't do this reliably yet.

Now Is The Time to Deploy AI

If the above is correct, then this has direct implications for how we should think about deploying technology and the economics of legal services.

  • Deploy AI where interpolation creates value. e.g. precedent research, document review, standard drafting, risk flagging in contracts, and compliance monitoring. These are tasks where AI can comfortably navigate known solution spaces. AI's strength at pattern recognition across massive datasets makes it excellent at performing the first cut of the legal work.

  • Don't deploy AI where extrapolation matters. e.g. bet-the-company litigation strategy, novel regulatory interpretation, first-of-kind deal structures, and creating new legal frameworks. These tasks require genuine novel thinking. AI will generate plausible-sounding output, but without verification mechanisms, no one should trust it.

  • Understand the verification problem. As we've written before, AI creates a "verification tax". For interpolation tasks, you can verify outputs by recognizing correctness — does this clause match the precedent? Does this analysis cite relevant cases? For extrapolation tasks, verification requires re-engaging with the entire problem, which eliminates any efficiency gain.

  • Price and structure accordingly. If 99% of legal work is sophisticated interpolation, then AI should dramatically reduce costs for the vast majority of legal services. The 1% that requires genuine novel thinking should become more expensive, as it becomes the primary differentiator between commodity and premium work.

We are seeing early signs that the market is starting to fragment along these lines. Some firms will offer "AI-powered" services at lower price points for work that fits the interpolation model. Other firms will charge premium rates for the genuinely novel work that still requires human creativity. The firms that will struggle are those that bundle everything together and pretend it all requires the same level of human expertise.

Should We All Quit Being Lawyers?

Most human knowledge work might be sophisticated interpolation.

When a senior partner applies decades of experience to solve a new problem, how much is genuine creative insight versus pattern recognition across thousands of similar previous situations? When a litigator constructs what feels like a novel legal argument, how much is truly original versus creative recombination of existing precedents?

We romanticize human creativity and reasoning. We tell ourselves that professional expertise requires genuine understanding and novel thinking. But does it?

AI is forcing us to be more honest about what most work actually entails.

What percentage of valuable work requires thinking that goes beyond sophisticated pattern matching?

AI is already very good at interpolation. The question is quickly changing from whether AI can think like humans to whether there is any work left for humans.

Most lawyers and firms aren't ready to answer that question.


This blog post was inspired by Matthew Honnibal's "Why I don't think AI is a bubble" and the "First Proof" paper. The questions raised here are particularly relevant for those building or buying legal tech in an era of rapid AI advancement.

Previous
Previous

Jevons’ Paradox Won’t Save You From the Bread Line

Next
Next

A Tale of Two Eras — The Unbundling of Legal Services