CeFPro Connect

Event Q&A
Why AI Approval Is Becoming the Defining Challenge for Model Risk Teams
As AI adoption accelerates, model risk teams are under pressure to approve systems faster while maintaining rigorous governance standards. This article explores why the real challenge is not speed, but how institutions classify and govern different AI use cases — particularly those operating in regulated control functions. Featuring insights from Behavox, it highlights the structural failures behind AI approval bottlenecks and outlines how leading institutions are redesigning governance framewor
Mar 20, 2026
James Burgess
James Burgess, Senior Regulatory Compliance Analyst, Behavox
Tags: Model risk
Why AI Approval Is Becoming the Defining Challenge for Model Risk Teams
The views and opinions expressed in this content are those of the thought leader as an individual and are not attributed to CeFPro or any other organization
  • AI approval pressure is rising as deployment accelerates and regulatory scrutiny intensifies

  • Uniform governance frameworks are creating bottlenecks across both low- and high-risk AI use cases

  • Leading firms classify AI by approval burden before validation begins, enabling more targeted oversight

  • Structural failures - not model quality - drive most approval breakdowns, particularly late MRM involvement and poor documentation

  • Misclassified use cases expose significant gaps between perceived and actual risk levels

  • Regulated control functions require significantly higher standards of explainability, transparency, and auditability

  • Model risk teams are shifting from gatekeepers to early-stage governance partners

Ahead of Advanced Model Risk Europe, we spoke with James Burgess, Senior Regulatory Compliance Analyst at Behavox, about the growing challenge of AI approval in financial services. Drawing on experience supporting over 100 institutions, he explores why approval bottlenecks persist, where governance frameworks are breaking down, and how model risk teams can shift from reactive validation to shaping AI systems for compliance from the outset.

Model risk teams are being asked to approve AI faster while regulatory scrutiny keeps increasing. Is this tension resolvable — or are MRM functions being set up to fail? 

It is resolvable — but only if we stop treating it as a speed problem. The instinct is to ask "how do we approve AI faster?" But the right question is "which AI actually requires deep scrutiny, and which doesn't?" 

Many institutions apply a broadly uniform governance process to all AI. That creates a bottleneck at both ends — genuinely high-risk systems don't always get the scrutiny they need, while low-risk internal tools get bogged down in validation queues built for a different era. 

The institutions managing this well have done something deceptively simple: they classify AI use cases by approval burden before the governance process begins. That changes everything. It allows model risk teams to concentrate rigorous work on systems that genuinely warrant it — those used in regulated control functions where a wrong output carries regulatory, financial, or reputational consequence. 

AI approval doesn't have to be a bottleneck. It becomes one when governance frameworks treat a meeting-summarization tool the same way they treat a communications surveillance system. Those are categorically different problems, and they require categorically different responses. 

What are the most common reasons AI systems fail model risk approval — and is the industry diagnosing this correctly? 

The conventional diagnosis focuses on model quality: the model is too opaque, too complex, too difficult to explain. That's real - but it's not where most failures actually originate. 

Having worked through AI approval processes with more than 100 financial institutions, the failures we see most consistently are structural, not technical. 

The most common failure mode is late MRM involvement. Technology and compliance teams build or procure an AI system and model risk is brought in at the end to validate something that was never designed to be validated. By that point, documentation is incomplete, architecture decisions are locked, and explainability has been treated as an afterthought. Approving it becomes extremely difficult. Rejecting it creates significant business friction. Nobody wins. 

The second failure is use case misclassification. A team builds what they consider a decision-support tool, but the outputs are actually driving consequential regulatory decisions. When MRM examines it properly, it's a Tier 4 control function that was governed as a Tier 2 tool. The gap between those categories is enormous.

The third failure is documentation debt. The model logic, training data, and assumptions were never captured contemporaneously. Validators are asked to approve a system based on reconstructed documentation. That is not a governance process - it is archaeology. 

All of these are preventable. They are not failures of model quality. They are failures of process design. 

How should model risk teams treat AI in regulated control functions differently from general AI tools? 

Significantly more rigorously, and much earlier in the lifecycle. 

When AI operates in a regulated control function - communications surveillance, market abuse detection, AML, conduct risk monitoring - the stakes are categorically different. These systems' outputs can directly determine whether a regulatory obligation is met or breached. They must meet a standard of transparency, auditability, and explainability that general-purpose AI is rarely designed for. 

For these use cases, model risk involvement should begin before architecture decisions are made. The choice between a large foundation model and a specialized, deterministic system should be informed by governance requirements - not decided by the technology team and handed to validation as a fait accompli. 

In practice, this means defining explainability requirements before procurement begins. It means establishing what "auditable output" means for the specific use case. It means requiring third-party vendors to demonstrate - not merely assert - that their system can be independently validated. 

Generic foundation models carry significant challenges for Tier 4 use cases. Training data provenance is often unclear, model logic is opaque, and outputs are probabilistic in ways that are difficult to document at the standard regulators require. That doesn't mean they can't be used - but the governance bar is materially higher, and institutions must go in with eyes open. 

This is precisely why Behavox developed the AI Risk Policy framework. Financial institutions deploying AI in compliance and surveillance functions needed a governance standard calibrated to what regulators actually expect - not generic AI governance, but use-case-specific governance for the highest-burden functions. 

Agentic AI is appearing on every risk agenda right now. What makes it materially harder to govern than conventional models?

Agentic AI is a genuinely different governance problem, and the industry is only beginning to engage with it seriously. 

A conventional AI model takes an input and produces an output. You can validate that relationship. You can test it, document it, monitor it. The governance surface is bounded. 

An agentic system takes a goal and sequences its own actions to achieve it. It makes intermediate decisions, uses tools, calls other systems, and produces outcomes through a chain of steps that may not be predictable in advance. The governance surface is unbounded — or at minimum, far harder to bound. 

For model risk, this creates three specific challenges. First, how do you validate behavior that is emergent and context-dependent? Traditional backtesting approaches do not transfer cleanly. Second, who is accountable for each decision in the chain? When an agent makes five sequential decisions to reach an outcome, accountability becomes difficult to assign. Third, how do you detect drift or anomaly when the system's normal behavior is inherently variable? 

For Tier 4 use cases, the stakes are higher still. An agentic system triaging compliance alerts, routing suspicious activity reports, or escalating conduct risk cases requires a standard of governance that current frameworks were not built for. 

Our view is that model risk teams should demand substantially higher standards of documentation, transparency, and human oversight before approving agentic AI for regulated functions. The framework exists — apply the Tier 4 classification rigorously and let that drive your documentation and monitoring requirements. 


What is the single most important question a model risk team should ask when reviewing AIfor a regulated function - that most teams currently don't ask? 

"Show me what happens when this goes wrong - and how we know."

Most AI approval conversations focus on performance when the system is working correctly. Validators review accuracy benchmarks, test cases, performance metrics. That is necessary but not sufficient. 

The more important question - particularly for agentic AI - is: what does failure look like, how quickly can it be detected, and what is the containment mechanism? If the team presenting the system cannot answer those questions clearly, the system is not ready for approval in a regulated function. 

This is also a design question. If you require an answer to "how do we know when it goes wrong" before build begins, you force the architecture to include monitoring, alerting, and human-in-the-loop checkpoints from day one. Systems built with that requirement embedded are significantly easier to approve than systems where monitoring is retrofitted. 

That single question transforms the validation exercise into a governance design conversation — and that is where model risk functions can add the most value. Not as the department that approves or rejects at the end, but as the function that shapes howAI systems are built for regulated environments from the outset. 

The shift from gatekeeper to governance partner is where the most effective MRM functions are heading. 

Download the practical framework for classifying AI by approval burden.

Behavox builds AI for regulated control functions in financial services. Behavox AI Risk Policies (AIRPs) have been approved by more than 100 financial institutions. 

James Burgess Bio

Biography coming soon

James Burgess
Sign in to view comments
You may also like...
ad
Related insights