

I recently sat down with Joel Hron, CTO at Thomson Reuters, for an episode of Building with AI: Promises and Heartbreaks YouTubeSpotifyApple Podcasts). Joel leads product engineering and AI R&D across legal, tax, audit, trade, compliance and risk. In other words, his teams are putting AI into some of the most high-stakes workflows in the world.
This post is my attempt to capture the parts of our conversation that stuck with me most: how a 100+ year old brand thinks like an AI company, what it means to ship agents into courtrooms and tax departments, and why Joel thinks evals are getting harder, not easier.
Like many people, Joel’s first reaction when Thomson Reuters approached his previous company ThoughtTrace was: “The news company? They do software too?”
As he dug in, he realized:
The core job of Thomson Reuters products is to deliver:
For decades, that meant building very good natural-language search and retrieval. When large language models arrived, they didn’t start from zero. They already had:
That was the foundation for their current wave of generative AI products.
Timing-wise, Joel took over TR Labs right around the ChatGPT moment. Up to that point, Labs was closer to an applied research group. There was less experience in the mainstream product teams on how to apply AI in delivery.
So they pivoted hard:
Most early use cases were classic RAG (retrieval-augmented generation)
“One of the core foundational components of good RAG systems is good search at the end of the day.”
Which is convenient if you’ve spent 30 years getting very good at search.
Joel shared two key product areas:
A recurring theme was this idea of “fringe value”: if the model is only good enough for the middle 70% of questions, it’s not actually differentiating you in a professional setting. The hard problems live on the edges.
When you move from simple search or classification models to LLM-powered systems, your evaluation playbook basically falls apart.
The old world:
The new world:
Joel’s team had to design new eval rubrics around ideas like:
They tried using LLMs as judges, like everyone else. It helped, but not enough to ship products that lawyers, courts and government agencies would rely on. So they leaned into their unfair advantage: thousands of domain experts.
I really liked one principle Joel shared: if you can fully remove humans from the loop, you might not be working on a hard enough problem.
That resonated a lot with how we think about analysts at Solid. It also echoes a theme from Stop saying “Garbage In, Garbage Out”, no one cares where I wrote about how messy reality always forces some human judgment back into the loop.
When your users are lawyers, judges, tax professionals and regulators, “move fast and break things” is not a helpful motto.
We talked about how Joel thinks about expectation alignment
Interestingly, TR leaned into providing long, comprehensive answers early on. While many systems focused on short responses, their legal research assistant was happy to give you a multi-page analysis.
The point is not to replace the lawyer. It is to:
If there is no link for a cited case, it’s a strong signal something is off. The UX itself is part of the quality and trust story.
This is a nice counterpoint to the broader internet debate of “AI answers vs sending traffic”. In TR’s world, sending the user back into the source system is exactly the right thing to do.
If you read Behind the scenes: how we think about semantic model generation, you’ll see a similar pattern: summaries and automation are powerful, but they always point back to the human-owned source of truth.
One of my favorite parts of the conversation was how Joel described agents:
“LLMs now seem like they are experts at operating a computer. Full stop.”
Instead of thinking about an all-knowing, general intelligence that contains everything, he sees the current trajectory as:
For Thomson Reuters, that maps very nicely onto their product universe:
The strategy going into 2026:
They already launched a “deep research” capability for Westlaw where:
Is it perfect? No. But it moves the AI assistant from “paralegal prototype” closer to “junior associate who never sleeps”.
If you listened to our earlier episode with Meenal Iyer and read Building an AI-powered Intelligent Enterprise: How a Data Leader Steers Her Team Through the AI Journey, you’ll recognize the same pattern: start with foundations, then let AI ride on top as a multiplier.
There was a small point Joel made that I think many teams building with AI need to hear.
In the early days, they scoped their GenAI products the way you would scope a normal feature:
That worked for a few months at a time, and then the ground shifted:
If you build for “what the model can do today” in a very narrow slice, there is a high chance you have to rebuild the whole thing in six months.
The adjustment they made:
So what keeps Joel up at night when he thinks about 2026?
Debugging looks less like tuning a single model and more like debugging a distributed system with a slightly moody colleague in the middle.
Users do not always want to wait 20 minutes for a perfect answer. Sometimes they just need to stay in flow:
Balancing that for each use case is still more art than science.
If you’ve used tools like Claude Code or similar agents inside your IDE, you know this feeling:
Joel expects the same thing in law, tax and audit:
At the end of the conversation, I asked Joel for an AI myth he’d bust in one sentence.
His answer:
“This is AGI. That’s the method bus.”
He doesn’t think we are at, or even particularly close to, AGI in the sense of a general, human-level intelligence. What we have instead are very capable systems for:
Commercially, that might actually be more interesting. Especially if you are Thomson Reuters and your “toolbox” happens to include most of the world’s legal and tax workflows.
For me, this conversation with Joel crystallized a few things I see across many of our customers:
If you’re building anything that sits between AI and critical decisions, I’d strongly recommend listening to the full episode with Joel (see links at the top of this post) And if you want a contrasting view from the data-leader side, check out our recap of the episode with Meenal Iyer in Building an AI-powered Intelligent Enterprise
In the meantime, I’ll keep asking guests the same closing question I asked Joel:
If AI in the next five years is over-hyped in the short term, but under-estimated in the long term, what does that mean for what you decide to build today