Skip to main content
The journal
StrategyJune 202610 min

The EU AI Act's high-risk system provisions arrive in August. Most organisations have not yet formally classified their deployed systems — and classification is the easier half of the problem.

The EU AI Act's Annex III high-risk provisions become enforceable on 2 August 2026, and the programmes most exposed are not the ones that moved fast and cut corners — they are the ones that shipped successfully in 2024 and built systems their organisations now depend on. Across six client deployments reviewed in the last two months, four were classifiable as high-risk and none had the Article 14 human oversight architecture the Act requires. Here is what the structural requirements actually demand, why they cannot be satisfied by a documentation layer, and why the market will enforce them before the regulators do.

By
Julian R. Mountford
Founder & Chairman
The EU AI Act's high-risk system provisions arrive in August. Most organisations have not yet formally classified their deployed systems — and classification is the easier half of the problem.

There is a particular meeting we have been in more frequently in the last two months. The client has a production AI system — built in 2024 or the first half of 2025, running in an operational workflow, trusted enough that the teams using it have mostly stopped thinking about it as AI and started thinking about it as infrastructure. The system works. Adoption is solid. Then the general counsel or the compliance director, who has finally been asked to look at the EU AI Act, poses a question: has this system been formally assessed for its classification under the Act?

The honest answer, in most of these conversations, is that it has not. The classification question was either never asked or deferred until the system was live and the usage patterns were clear, and then deferred again while the compliance function attended to things with more immediate deadlines. On 2 August 2026 — six weeks from now — the Annex III high-risk system provisions come into effect. The organisations that will be caught out are not the ones that built AI systems carelessly. They are the ones that built them well, deployed them in consequential workflows, and assumed the regulatory question would be easier to answer once the system's behaviour was understood. It is proving substantially harder.

#02What the Act classifies as high-risk, and why the category surprises teams

Annex III of the EU AI Act defines eight categories of high-risk AI systems. The categories most relevant to enterprise AI deployments in financial services and professional services are the employment category, the access-to-essential-services category, and — more often than organisations expect — components of critical infrastructure management. The prohibitions on unacceptable-risk practices came into force in February 2025 and received most of the public attention. The Annex III high-risk provisions, less discussed, arrive in August 2026 and affect a substantially larger number of production systems.

The employment category is the broadest in practice. Any AI system that participates in decisions about how workers are recruited, assigned tasks, monitored for performance, or evaluated — including systems whose outputs are reviewed by a human before a final decision is made — is in scope, provided it is used in the EU or its outputs affect EU residents. The Act does not distinguish between a system that makes employment decisions autonomously and a system that produces ranked lists or performance signals that a human then acts on. If the system's outputs influence decisions about individuals' employment conditions, it is high-risk.

This classification surprises organisations most often in three specific contexts. First, any internal performance management tool that produces signals from employee data — output tracking, task completion rates, pattern deviation — is likely in scope. Second, any recruitment or talent intelligence tool that scores, ranks, or profiles candidates, even as a first-pass instrument reviewed by a recruiter, is in scope. Third, any AI system embedded in a workflow that determines which work items are assigned to which people is in scope if that assignment influences pay, evaluation, or continuity of employment. The phrase 'decision support' in the original project brief is not a classification. The Act is interested in how the system is used, not in how it was described when the budget was approved.

Across the six deployments reviewed in the last two months, four met the high-risk threshold under the employment or financial services categories. In none of the four had the team conducting the original build performed a formal Annex III classification exercise. In two cases, the team was confident the system did not qualify, because it had been described throughout the project as a decision-support tool. That description did not match the functional reality of how the outputs were being used.

#03What the architectural requirements actually demand

For high-risk systems, the Act sets out requirements that are architectural in nature. They cannot be satisfied by documentation alone, and they cannot be retrofitted into a production system by adding a governance layer around the outside of it. The requirement that creates the greatest practical gap in the systems we have reviewed is Article 14: human oversight.

Article 14 requires that high-risk AI systems be designed and developed in such a way that they can be effectively overseen by natural persons during the period in which they are in use. The Act then specifies what effective oversight means: the overseer must be able to understand the system's capabilities and limitations sufficiently to identify anomalous functioning; the system must be able to be disregarded or overridden; and it must be possible to halt or stop the system if necessary. These are three distinct requirements. All three must be satisfied by design, not by governance arrangement.

The first — interpretable outputs — is a genuine architectural choice made during development. A retrieval-augmented system that returns its sources alongside its outputs satisfies Article 14's understanding requirement relatively naturally. A classification model that returns a score and a category label does not, regardless of the confidence interval attached. The second — structured override — requires that human disregard of the system's output be a first-class operation: logged, attributed, and legible to anyone auditing the system's behaviour. An operator who ignores a flag is not the same as an operator who exercises an override. The distinction matters to the Act because one produces an audit record and the other does not.

The third requirement is the one teams consistently underestimate. A system that requires a deployment operation to stop is not capable of being stopped by a natural person during operation, in the sense the Act intends. In a performance management system assessed in April, the practical procedure for suspending the system's weekly output was to contact the engineering team and request a deployment hold. That is a change management process with a human in the loop, not an operational stop control. The Act's language is specific: the system must be able to be halted by designated human overseers in a way that preserves its outputs and records for subsequent review.

Retrofitting these three requirements onto a system that was not designed with them costs, in our recent experience, between six and fourteen weeks of engineering, depending on the system's architecture. Building them in from the start — at design time, before the development sprint begins — adds roughly one sprint to a typical enterprise AI engagement. The asymmetry is not surprising. It is the same asymmetry that appears in every domain where structural requirements are easier to satisfy before a system exists than after it is in production and trusted.

Retrofitting the three Article 14 requirements onto a system that was not designed with them costs between six and fourteen weeks of engineering. Building them in from the start adds roughly one sprint.

#04The technical documentation the Act actually requires

High-risk systems must be accompanied by technical documentation demonstrating compliance. The required contents are specified in Annex IV, and most items — system description, intended purpose, development methodology, performance benchmarks — can be assembled from existing project documentation with moderate effort. Two items are consistently absent.

The first is a foreseeable-misuse analysis: a description of the ways the system might be used other than as intended, the risks those uses create, and the mitigations in place. Most enterprise AI project documentation was not written adversarially. The team building the system was focused on making it work, not on cataloguing the ways it might be misused by someone with different incentives. The misuse question is genuinely different from the accuracy question, and 'reasonably foreseeable' under EU law has a meaning broad enough to include uses the builder could have anticipated with reasonable care — not merely the ones they thought of at the time.

The second absent item is a post-market monitoring plan: a description of how the system's performance in production will be tracked, who is responsible for that tracking, and what triggers a formal review of the system's classification or behaviour. Most organisations with production AI systems have monitoring dashboards. Almost none have a named individual accountable for monitoring the system's behaviour relative to its approved intended purpose, with a documented escalation procedure for deviations. The difference between an operational dashboard and a post-market monitoring plan is the difference between a metric and an accountability structure.

Retroactive technical file completion for a system that has been in production for twelve months is possible, but requires access to design-phase decisions that may not have been documented at the level of detail Annex IV demands, and to the people who made them. We have completed this work three times in the last six months. It takes between three and six weeks and is more substantive than the word 'documentation' implies — it is, in practice, a partial re-audit of the system's design against requirements that did not exist when the design was approved.

#05Why the compliance layer approach fails

The default response when an organisation identifies a compliance gap — architectural or documentary — is to commission what might be called a compliance layer: additional governance documents, a human-review step added to the interface, and a steering group to oversee the system's ongoing performance. This approach satisfies the appearance of the Act's requirements while failing their substance.

The Act does not require evidence of compliance. It requires compliance — structural, architectural, verifiable compliance that exists in the system's design and can be demonstrated through the technical documentation. An override button added to a production interface in May 2026 that carries no associated log, no defined owner, and no documented escalation procedure does not satisfy Article 14. A governance committee that meets quarterly to review aggregate performance metrics does not satisfy the post-market monitoring requirement, which calls for systematic data collection on the system's functioning with defined review criteria and defined triggers for re-evaluation. A risk analysis written retrospectively to explain why the unmitigated risks are acceptable is not a risk management plan — it is a description of the risks that went unmanaged.

The enforcement implication is practical. The EU's AI supervisory authorities, established at national level under Article 70, will examine technical files rather than governance presentations. The questions they are trained to ask are specific: show me the override mechanism; show me the log of when it was used; demonstrate the stop control; show me the foreseeable-misuse analysis and describe what changed in the system's design as a result of it. A compliance layer that sits beside the system rather than inside it cannot answer these questions, because the answers live in the architecture, not in the documentation.

The willingness to identify where the gaps actually are — rather than where the documentation suggests they should not be — is what distinguishes organisations that will pass regulatory scrutiny from those that will not. In two procurement conversations this year, we were asked to describe an AI system with a documented compliance gap and explain what was being done about it. Both times, the honest answer produced more confidence than a presentation suggesting no gaps would have done. Sophisticated buyers have learned to read pristine compliance documentation with scepticism.

#06The UK position, and why building to the EU standard is the cheaper choice

The UK has diverged deliberately from the EU's statutory approach. The government's pro-innovation framework assigns AI governance to existing sector regulators — the FCA, the ICO, the CQC — applying principles within their existing remit rather than categorical requirements. For organisations operating solely within the UK, the EU Act's Annex III obligations are not currently mandatory, unless the system's outputs affect EU residents.

Most of the organisations we work with do not operate solely in the UK. They operate across the UK, Europe, and in several cases the Gulf, where the UAE's AI governance framework takes an accountability-centred approach rather than a categorical risk classification model. For those organisations, the architecture question is whether to build to the most demanding standard applicable across their jurisdictions, or to build to the minimum in each and retrofit as the jurisdictional footprint expands.

We have recommended the former, consistently, and have had it challenged on cost grounds each time. The cost argument does not survive examination. Building Article 14-compliant oversight into a system's design costs roughly one sprint at design time. Retrofitting it after production deployment costs, as described above, between six and fourteen weeks. The UK framework may remain stable, or it may converge with the EU position under commercial and regulatory pressure — that pressure is already visible in the voluntary adoption of EU-style documentation requirements by some UK-facing AI vendors responding to procurement questionnaires from European counterparties. An organisation that builds to the EU standard now has a system that is jurisdictionally portable. One that builds to the UK minimum and later discovers it is operating in scope of the EU Act will pay the retrofit cost at a moment it did not choose.

#07What the remaining six weeks allow

The available actions, with six weeks before the August enforcement date, are narrower than they were a year ago. They are not empty.

The first priority is a formal classification exercise — not a reading of guidance summaries, but a direct engagement with the Annex III text applied to each production AI system, asking the question in the Act's terms: does this system's intended purpose fall within a listed category, and is it used in a way that affects the rights or interests of individuals in the EU? The answer is usually clearer than anticipated once the actual text is applied to a specific deployment, rather than to the general description of the problem the system was built to solve. The systems that are ambiguous in the abstract tend to resolve quickly against a named workflow.

For each system identified as high-risk, the Article 14 requirements should be assessed directly and honestly. Can the human responsible for the system's outputs understand, from what the system produces, how that output was reached and where it might be wrong? Is there a structured mechanism for recording when an output is disregarded, and does that record exist in a location auditable by someone other than the engineering team? Can the system be stopped by its designated operational owner — not by requesting a deployment rollback — within a defined time frame? These are yes or no questions. The honest answers are usually immediate.

The technical file work is harder to rush without compromising it. The foreseeable-misuse analysis, in particular, requires the people who built and operate the system — not a compliance function working from a template. The people who know what the edge cases look like, and what happens when the system produces an output the original design did not anticipate, are the people who can write the analysis correctly. That work should begin now, regardless of whether a complete file can be assembled before August.

The longer observation is about sequence rather than deadline. The organisations that will have the cleanest position in 2027 and 2028 are not the ones that respond fastest to August 2026. They are the ones that have made Annex III classification a standard step in the AI system design phase — so that oversight mechanisms, stop controls, risk analyses, and monitoring plans are present at launch rather than assembled in the weeks before a regulatory review. The Act has not changed what a well-governed AI system looks like. It has provided, for the first time, a statutory description of what a poorly-governed one fails to include.

About the author
Julian R. Mountford
Founder & Chairman

Every piece in the Journal is written personally by a senior practitioner, drawing on the engagement that motivated it. No ghostwriters, no content team, no models. If a paragraph here resonates with a problem you are looking at, the author is the person to reply to — direct lines beat anonymous inboxes.

Get in touch with the practice