StrategyJune 20269 min

The AI programmes authorised in 2024 are reaching their renewals. Most cannot demonstrate what they produced.

The enterprise AI programmes authorised in 2024 are now reaching their renewal cycles, and in most cases the teams responsible cannot demonstrate what was produced — not because the systems failed, but because the evidence was never built to survive the workflow changes that deployment caused. The board asks for the ROI number; the programme lead has recall metrics and usage data. Here is why the measurement gap is almost universal, what it costs at renewal, and what the four programmes that passed their renewals did differently from the start.

Julian R. Mountford

— Founder & Chairman

The AI programmes authorised in 2024 are reaching their renewals. Most cannot demonstrate what they produced.

We are now sitting in a different kind of meeting. The first meeting, two years ago, was about feasibility — whether the programme was technically achievable, whether the vendor was credible, whether a working prototype was plausible in six months. The meeting we are now sitting in, with increasing frequency since the first quarter of this year, is about renewal. The programme has been running. The board wants to know what it has produced.

This should be the simplest question in the room. In most of the organisations we work with, it has become the most difficult. The systems are in production. The teams use them. In several cases, the operational owners who were most resistant at launch have become the loudest advocates. The value is real. The problem is that nobody has the number.

#02The counterfactual that nobody archived

What happened, in the majority of the first-generation enterprise AI programmes, is that the measurement architecture for demonstrating business value was either never built or was built and then quietly abandoned once the system was live and producing operationally sound metrics. Latency was measured. Recall was measured. Uptime was reported weekly. The operational dashboard looked healthy. What was not measured — in a way that would survive twelve months of process change — was the counterfactual state: what the workflow looked like before the system existed, described precisely enough that an independent observer could reconstruct the comparison at renewal time.

The counterfactual is the hard part. An AI system that goes into a workflow changes the workflow. In the first weeks after launch, the comparison is obvious: the team reviewed fourteen contracts per day before, they review twenty-two now, and the difference is attributable to the system. By month twelve, the team has restructured. The definition of a contract review has shifted. Two lawyers who were doing first-pass review are now handling different work entirely. The metric that existed before — contracts reviewed per lawyer per day — belongs to a process that no longer exists. The board asks for the ROI number and the programme lead can tell them the system has 97 per cent active weekly usage and a recall score of 91.3 per cent. Those numbers are real. They are not the number the board asked for.

We have watched this exact conversation unfold in three renewal presentations since January. In one case — a document intelligence platform built for a legal services operation, not a system we built — the programme had been live for sixteen months. At month two, the team had documented a 38 per cent reduction in first-pass review time, and the documentation was thorough: labelled examples, a well-defined process comparison, a methodology that could be reproduced. At month sixteen, the programme lead could not reproduce the comparison, because the baseline workflow had been reorganised twice and the original process no longer existed. The board declined to renew the budget. The system was shut down. It had been producing real value.

#03Capital project governance applied to an operational capability

The structural reason this keeps happening is that enterprise AI programmes are governed like capital projects rather than like operational capabilities. A capital project has a business case, a build phase, a launch milestone, and a post-launch sign-off. After sign-off the asset passes into operations, the project team disbands, and ongoing reporting is handled by the operational function that now owns the asset. Nobody maintains the business case. Nobody updates the comparison methodology when the workflow changes. Those activities belonged to a closed project that no longer exists.

That governance model is appropriate for infrastructure. It is almost exactly wrong for an AI system that improves or degrades over time, that changes the workflow it is embedded in, and whose competitive context is a market of alternative systems that are also improving. The business case for a motorway does not require annual updating. The business case for an AI system that competes with new foundation models every six months, and that operates inside processes that evolve continuously, needs to be treated as a living document rather than a project artefact.

What almost no organisation has done is assign ongoing accountability for the business case to a named person who is not the person accountable for the system's technical performance. The technical performance owner has every incentive to report on what they can measure — and what they can measure is the operational health of the system. The business outcome is a different measurement, belonging to a different part of the organisation, requiring a different cadence. Where no one owns it, no one updates it.

#04What the programmes that passed their renewals had in common

We have been present at or close to four programme renewals that went well — two we managed from launch, two we were brought in to support during the renewal preparation period. Looking across those four against the three that did not, four structural differences are visible.

The baseline was documented and locked before launch, not merely recorded. In both programmes we managed from the beginning, we ran a deliberate archiving exercise in the four weeks before go-live: a formal description of the pre-system workflow, the specific metrics that described it, and a written methodology for how those metrics had been calculated. The documents were signed off by the programme sponsor and the operational owner and filed in a location that would outlast the project team. That work took approximately three weeks of a senior programme manager's time. It was the work that made the renewal presentation possible sixteen months later.

The business outcome had a named owner who was not the system owner. The person accountable for whether the document review cycle hit its business target was explicitly separated from the person accountable for whether the model was performing correctly. These are different accountability relationships involving different expertise and different reporting lines. Conflating them produces the common failure mode where the technology team reports green on all metrics and the business unit reports dissatisfaction and the board cannot reconcile the two readings.

Measurement was quarterly at the business level, not weekly. Weekly operational metrics are the right instrument for finding production failures. Quarterly business outcome reviews are the right instrument for building the renewal case. Most programmes report one rhythm and call it both. They are not interchangeable, and in many mature deployments they have surprisingly weak correlation — a system can be technically healthy and operationally embedded while producing less business value than the case assumed, because the use pattern has drifted from the workflow the business case described.

“A programme that cannot demonstrate its value at renewal is not the same as a programme that has no value — but in a budget meeting, it is treated as one.”

#05The theory of change, written before we understood why

The fourth practice is less common and in my view the most important. In two of the four successful renewals, the programme had documented — before launch — a theory of change. Not the academic version: a two-page operational document describing what the programme expected to change, why, and what a reasonable observer would expect to see if those changes were occurring. The theory of change is not a baseline metric. It is an argument about mechanism.

The theory of change becomes the lifeline when the quantitative baseline has been overwritten by process change. When the board asks for the number and the number no longer exists in a form that can be cleanly compared to the present, the theory of change provides the structure for a qualitative case: we said this system would change how senior lawyers allocated their attention, and here is what senior lawyers are now doing with their time. That is not a number. It is an argument. Arguments can survive workflow restructuring in ways that baseline metrics cannot, because they describe a mechanism rather than a measurement of a particular state of a particular process.

We now include theory-of-change documentation as a standard deliverable in every programme we manage. We started doing it after watching the first two renewals fail in ways that it might have prevented. We did not understand at the time why we were doing it — it felt like programme hygiene rather than a specific risk mitigation. We understand now.

#06The second wave of renewals

The second half of 2026 will bring a round of budget renewals covering programmes deployed in the first half of 2025 — the period when enterprise AI investment was at its highest and the urgency to ship often outpaced the rigour around measurement. Those programmes were built when 'it works' was often sufficient justification for continued funding. That threshold has shifted. Boards that have now watched several renewal cycles understand what the conversation requires, and they are asking for it.

The programmes that will not survive this round are not, for the most part, the ones that failed technically. They are the ones that succeeded quietly — absorbed into operations so thoroughly that nobody remembers working without them, the before-state long since overwritten, the business case closed and filed with a project team that no longer exists. Their value is real and invisible. In a budget meeting, invisible is the same as absent.

The teams that will navigate this well are the ones that treated measurement as a delivery requirement from day one — not because they foresaw this particular conversation, but because they were rigorous enough to ask, before a line of code was written, how they would know in eighteen months whether the programme had done what it said it would do. That question is unglamorous. It produces documents that nobody reads during the delivery phase and that become the most valuable single artefact the programme produced when the renewal conversation arrives. We have learned this, as we have learned most things in this practice, from watching programmes that did it the other way.

● About the author

Julian R. Mountford

Founder & Chairman

Every piece in the Journal is written personally by a senior practitioner, drawing on the engagement that motivated it. No ghostwriters, no content team, no models. If a paragraph here resonates with a problem you are looking at, the author is the person to reply to — direct lines beat anonymous inboxes.

Get in touch with the practice

Earlier piece

Most multi-agent orchestration deployments are solving coordination problems the architecture introduced.

Later piece

The context window is not a retrieval architecture.