When Stress Models Break: Rethinking Oversight in Extreme Scenarios

Event Q&A

When Models Fail Under Stress and Decision Integrity Is on the Line

As stress testing expectations intensify and systemic risks grow more complex, institutions are re-examining how models perform under extreme conditions. With cascading effects, nonlinear dynamics, and AI-driven analytics reshaping the landscape, firms must design stress frameworks that remain credible when calibration fails, balancing advanced techniques with judgement, governance discipline, and decision integrity under pressure.

Feb 20, 2026

Tanveer Bhatti, Early Stage Fintech Investor, Former Group Head of Model Risk, Revolut

Tags: Model risk

When Models Fail Under Stress and Decision Integrity Is on the Line

The views and opinions expressed in this content are those of the thought leader as an individual and are not attributed to CeFPro or any other organization

Examines why models lose validity during crises and extreme stress events
Discusses systemic cascades and cross-functional dependencies in stress scenarios
Explores what makes a scenario severe yet plausible and governance-ready
Assesses the limits of AI in forward-looking stress testing
Positions model risk management as “decision integrity engineering” at portfolio level
Highlights the value of practitioner forums in sharing real-world failure modes

Ahead of Advanced Model Risk Europe, we spoke with Tanveer Bhatti about the practical realities of stress testing in extreme environments. Drawing on experience across G-SIBs and fintech, he reflects on why models break under pressure, how systemic cascades propagate through institutions, and why effective oversight requires more than periodic validation, it requires designing systems that fail safely.

Stress testing is increasingly expected to capture extreme and systemic shocks. What are the biggest challenges in ensuring models remain valid and reliable under these conditions?

I saw this managing model risk in a G-SIB during 2008. The models kept outputting numbers right through the crisis. Validity collapses at the moment you need it most, and nobody in the room has standing to say so because the model said it. Then COVID removed any remaining comfort. 2020 showed how quickly your calibration set becomes irrelevant. Outliers appeared in almost every model simultaneously. Backtesting told you nothing useful at that point.

The philosophical problem runs deeper than most institutions admit. You're asking models built on repeatable mass phenomena to price non-repeatable events. That's a category error disguised as a quantitative one. What actually holds under pressure is explicit uncertainty bounds, sensitivity to the variables that drive outcomes, and hard circuit breakers when outputs move outside credible ranges. The framework has to be designed to fail safely.

Interconnected risks and nonlinear effects are particularly difficult to model during periods of stress. How should institutions approach capturing cascading effects and cross-functional dependencies more effectively?

If you've sat in valuation control at a G-SIB, you learn that pricing, risk, P&L, liquidity, and operations are not separable in a crisis. Pre-2008, we modelled as though they were. Cascades travel through actions. A margin call triggers collateral posting, that hits funding, that changes client behaviour, that pressures capital. If the stress framework doesn't model that chain, you're testing nodes and calling it a network test.

I'd hedge on claiming there's a single correct modelling approach. Network models capture some things, agent-based models others, scenario overlays something else again. The practical answer is triangulation, because analytical rigour is useless if it can't be run and explained and governed.

Scenario design sits at the heart of credible stress testing. What makes a scenario both sufficiently challenging and still plausible, while remaining grounded in historical experience?

Plausibility doesn't come from historical frequency. It comes from a coherent causal story with mechanics that are defensible. The most common failure is stacking bad outcomes without coherence. That produces a list of fears. Severity comes from tightening the narrative and pushing the parameters that actually drive outcomes. Funding spreads, haircuts, behavioural run-off. The severity follows from the mechanics.

Plausibility is also partly social. It depends on what senior management and supervisors will accept as credible. That's not intellectual weakness, it's institutional reality. The scenario has to survive the room as well as the model. One test I'd always apply is what would falsify it. If nothing can, it's not a scenario.

Looking ahead, where do you see the limits of AI in stress testing, and how should firms balance advanced analytics with expert judgement when designing forward-looking scenarios?

At the fintech where I built the model risk function, we oversaw AI and ML models screening close to a billion transactions a month. Scale doesn't make the core problem smaller. It revealed how quickly spurious correlations propagate when you're not watching. AI won't know when it's wrong. It smooths over inconsistencies and states causal links with confidence that hasn't been earned.

AI generates candidate scenarios and scans for weak signals. Human judgement selects the mechanisms, sets the severities, defines the failure conditions. If a scenario can't be explained in plain English to the Risk Committee, it doesn't run. Where I'd genuinely hedge is on time horizons. The governance and controls have to be designed to survive the speed at which AI capability is moving.

As stress testing frameworks become more complex, how do you see the role of model risk management evolving to provide effective oversight and challenge in the years ahead?

A billion transactions a month changes your perspective on control. You don't get quarterly comfort blankets. The system moves daily, sometimes hourly. Validating each model and

signing off annually is not a serious control posture for the world we're now in. The harder problem is portfolio-level exposure. Common data dependencies, shared vendors, model monoculture. These are invisible when models are reviewed individually and only appear at the system level.

Before any model touches a stress decision, you need to know what breaks it and what happens when it does. Kill-switches and degraded modes are not optional features. Decision error under stress has to be measured. That's the actual job. Decision integrity engineering, not model sign-off.

Why are industry forums like Advanced Model Risk Europe important for helping practitioners share insights, challenge assumptions, and keep pace with these evolving stress testing and model risk challenges?

The failure modes in this industry are shared and the industry keeps relearning them in isolation. These forums are where unknown unknowns get named early. You learn what failed elsewhere before it fails in your shop. They also normalise what good looks like across institutions, which matters when supervisory expectations are moving and regulators compare notes.

The conversations that move practice forward are the ones where people trade real failure modes and operating model lessons. The stuff that hurt. If the room is comparing what actually happened in production, what the controls missed, which assumptions turned out to be fragile, that's worth everyone's time. If it stays at the level of principles, it isn't.

Tanveer Bhatti Bio

Tanveer Bhatti is a senior risk executive and private equity investor specializing in the operationalisation of AI governance and Model Risk Management within global financial institutions. He previously served as Head of Group Model & AI Risk at Revolut, where he built the second line of defence capability from inception for a global fintech serving 75M+ customers. Prior to that, he was Managing Director and Head of Model Risk at Citi, overseeing a global team of ~175 quantitative professionals and a portfolio of 2,500+ models across a ~$2.6tn balance sheet. He currently advises series A and B AI control infrastructure firms on embedding regulator-grade governance for Tier-1 US, UK, and EU financial institutions