01Weeks 1–3

Discover + MRM scoping

SR 11-7 documentation template aligned with the bank's model risk committee. Investigator workflow audited live on the SIU floor.

02Weeks 4–10

Pilot with senior investigators

12 named investigators on real cases. Faithfulness, citation accuracy, override rate tracked weekly. MRM in every retrospective.

03Weeks 11–14

Production rollout

All 80+ investigators onboarded. Audit trail wired to existing case management. Model card cleared by second-line validators.

04Beyond 14

Monitoring and extension

Monthly model performance review. Year-two scope opened to AML and fraud strategy adjacent use cases.

The challenge

Trading and monitoring screens at a financial services desk.

A top-10 US bank needed to clear its tier-2 fraud investigation queue faster without weakening investigation quality or failing model risk review. Tier-2 investigators spent a mean of 14 minutes per case. The queue grew about 4% per quarter, tier-3 escalations grew 11%, and the operation handled roughly 1.2M escalations a year. Total tier-2 capacity was the binding constraint on the bank's entire fraud operations cycle time. Either throughput moved, or the bank kept staffing up to chase a queue that compounded faster than hiring could fill it.

The bank had tried twice before. Both attempts died in Model Risk Management review. The first had no validation framework at all. The second had a challenger model section that ran to eight paragraphs of bullet points, no benchmark, no statistical test against a held-out set. The MRM lead asked for a benchmark in the kickoff meeting of the validation review. The team didn't have one. The review closed that afternoon.

This bank fraud AI case study describes the third attempt. It had to go live through MRM, deliver measurable handle-time improvement with no quality regression, and integrate with the bank's existing Actimize case management environment plus downstream connections into multiple core systems. Above all, it had to be defensible: SR 11-7 compliant on go-live day, OCC 2011-12 traceable in audit, and explainable to a board-level risk committee that had been burned twice.

The constraints

The build sat inside a tight regulatory and operational envelope. Five constraints shaped every design decision.

  • SR 11-7 and OCC 2011-12 model risk governance. Every modelling decision had to map back to the bank's model risk policy. Documentation, validation, ongoing monitoring, and challenger model artefacts all had to be in place at go-live, not promised for a later phase.
  • Actimize as the system of record. The copilot had to sit alongside Actimize, not replace it. Investigators worked the queue in Actimize; the copilot enriched the case view inside the existing workflow. No new login, no new screen to learn.
  • Bank VPC, bank KMS, bank IAM. The model ran inside the bank's network, against the bank's key management service, with the bank's identity and access controls. No data left the perimeter. Vendor SaaS LLM endpoints were ruled out in week one.
  • Human-in-the-loop on every disposition. Federal Reserve guidance and the bank's own AI policy both required human decision-making on fraud dispositions. The copilot could summarise, retrieve, and propose. It could not decide.
  • Change-management posture. The tier-2 investigator population had veteran practitioners with strong opinions about prior tooling. The copilot had to win adoption on its merits, not by mandate. Two of the most vocal critics were brought in as co-designers in week six.

Our approach

Our approach rested on four principles: treat model risk management as a build constraint, automate the repetitive parts of the investigator workflow with retrieval, run a challenger model from week two, and measure investigation quality as closely as handle time. Five Rockmere consultants worked alongside 14 bank team members across the engagement.

MRM as a first-class build constraint, not an afterthought. Week one, we mapped every SR 11-7 requirement to a specific engagement deliverable. Model documentation, validation plan, challenger model design, monitoring infrastructure, exception handling. All of it sat on the build plan as work to be done, not paperwork to write after the model was finished. The bank's MRM team joined as engagement stakeholders in week three. They saw every design decision before it was made. The named MRM lead reviewed the validation plan draft in week four and returned comments inside a week. That cadence held for every subsequent artefact.

Retrieval over the investigation knowledge base. We watched 23 tier-2 reviews in the first two weeks. The investigator's manual workflow had six steps. Read case context, look up applicable fraud policies, search for similar prior cases, summarise findings, propose an investigation plan, execute the plan. The first five were repetitive work that LLMs do well. The sixth was judgment work that LLMs do badly. So we built a RAG architecture that automated steps one through five and left step six untouched. Hybrid retrieval with BM25 for exact-match policy lookups, dense embeddings for the case-similarity search, cross-encoder rerank on the top 50 hits. Weekly evaluation against a held-out set of 200 known dispositions, scored on faithfulness, citation precision, and disposition-similarity to the senior-reviewer ground truth.

Challenger model from week two. SR 11-7 explicitly requires a challenger to validate the primary model's performance. Most AI initiatives bolt one on at validation time. We built ours in week two: a simpler heuristic baseline that ran in parallel on every case and produced the comparative numbers MRM would need. The challenger never went into production. It existed to give the validation report something real to compare against, and to detect drift in the primary model after go-live.

Investigation quality, not just handle time. A 38% handle-time win that came with a 2-point quality drop would have killed the program. So senior reviewers blind-audited a 5% sample of all dispositions, pre-AI and post-AI, with weekly statistical comparison. We told the bank in week one that we would pull the plug ourselves if the audit numbers moved in the wrong direction. That commitment mattered in week nine, when an early prompt iteration produced a 1.4-point quality dip on a Monday sample. We rolled back the prompt the same day and the numbers recovered by Wednesday. The MRM lead noted the rollback in the weekly review as exactly the kind of operational discipline she had been looking for.

What we delivered

We delivered a production AI fraud investigation copilot, a complete SR 11-7 documentation package, and an operational runbook. The copilot integrated with the bank's Actimize case management environment and ran inside the bank's VPC with the bank's KMS. The system:

  • Reads the case context, customer profile (subject to access controls), and transaction history
  • Retrieves applicable fraud policies and the 5 most-similar prior case dispositions
  • Generates a case summary and proposed investigation plan
  • Surfaces evidence and policy citations inline (every claim links to source)
  • Logs every retrieval, every generated output, and every investigator action to the audit system
  • Provides explicit uncertainty markers when retrieval quality is low

Plus a complete SR 11-7 documentation package: model development documentation, validation report, challenger model documentation, monitoring procedures, model risk classification, exception handling protocol, and the ongoing-monitoring infrastructure to satisfy continuous-validation requirements. The validation report alone ran 84 pages, every section traceable to a specific SR 11-7 paragraph.

Plus the operational runbook. Named on-call rotation between Financial Crimes Technology and the Rockmere team for the 90-day stabilisation window, with a written escalation tree for model anomalies, retrieval-quality regressions, and Actimize integration failures.

The result

The copilot cut tier-2 fraud investigation handle time 38% (from 14.0 to 8.7 minutes), raised daily case disposition rate 57%, and held investigation quality steady, all while clearing SR 11-7 review on the first cycle.

Metric Baseline After 90 days production Change
Tier-2 mean handle time 14.0 min 8.7 min −38%
Tier-2 daily case disposition rate 24.3 per investigator 38.1 per investigator +57%
Senior-review audit quality (blind sample) 92.4% 92.7% unchanged
Customer experience NPS (post-fraud-resolution survey) 31 44 +13 pts
MRM review cycles to approval 2 prior attempts rejected Single approval cycle first-pass approval

The system handled 1.2M+ cases in its first year of production with zero MRM-flagged incidents and zero customer escalations attributed to the AI component. The 38% handle-time number is the one the bank's CFO quoted to the board. The single-cycle MRM approval is the number the Chief Risk Officer quoted. The bank's MRM lead has since referenced the validation package as a template in two other AI initiatives inside the same institution.

Engagement timeline

Week Workstream
Week 1 Walked the contact center floor, watched 23 tier-2 reviews, mapped every SR 11-7 requirement to a build deliverable. Pulled 500 prior dispositions for the held-out evaluation set.
Week 2 Built the evaluation harness. Started the challenger model. First draft of the model risk classification memo.
Week 3 RAG architecture in skeleton form, first retrieval results against the eval set. MRM lead joined the standup. Validation report outline approved by MRM.
Weeks 4–5 Reranker tuning, prompt iteration on the case-summary step, validation report draft one. OCC 2011-12 traceability matrix completed.
Weeks 6–8 Actimize integration. Investigator workflow co-design with six representative tier-2s, two of whom turned out to be the strongest critics and ended up writing the override protocol. Cross-encoder swapped after a faithfulness regression in week 7.
Weeks 9–10 Pilot to 14 investigators. Senior-review audit protocol active from day one of pilot. One prompt rollback in week 9.
Week 11 Formal MRM validation review. Single cycle, approved.
Week 12 onward Phased rollout to full tier-2 organization, ongoing monitoring, weekly drift review.

What survived past our engagement

Four artefacts outlasted the engagement and now belong to the bank.

  1. The SR 11-7 documentation template. The bank's MRM team adopted our validation report structure as a standard for subsequent AI initiatives. Two adjacent use cases (sanctions investigations, AML transaction monitoring) have since gone live using the same template.
  2. The RAG architecture pattern. The retrieval stack (BM25 + dense + cross-encoder rerank with policy-citation surfacing) became a reusable component inside the bank. The Financial Crimes Technology team maintains it as an internal platform service.
  3. The evaluation harness. The 200-case held-out set is rerun weekly by the bank's MLOps team. Drift on faithfulness or citation precision triggers a ticket to the named owner in Financial Crimes Technology.
  4. A named owner with budget. The Director of Financial Crimes Technology owns the copilot. Ongoing-monitoring spend sits in her cost centre. The escalation tree is written and tested.

The bank's MRM lead now sits on the AI Risk Committee that reviews every new AI initiative at the institution. She uses the language from our validation report in the committee's review criteria. The credential authority that made this engagement possible (named SR 11-7 experience on the Rockmere team) is detailed on our credentials page.

Where this fits

This is canonical for our Financial Services practice. AI built with MRM as a design input, audit trail at SR-grade, and human-in-the-loop preservation for any disposition-level decision. The retrieval architecture is detailed in RAG Systems; the SR 11-7 integration is part of our AI Transformation practice in regulated industries.

For comparison, the same regulatory-first build pattern appears in our State Medicaid eligibility AI case study (NIST AI RMF instead of SR 11-7) and in our SAFe® ART launch at a P&C carrier (NAIC and state DOI evidence integration). If you have an AI initiative blocked at MRM, or one that hasn't started because of MRM concerns, get in touch. We have specific experience with SR 11-7, OCC 2011-12, and several major US bank MRM frameworks.