Bank Fraud Investigation AI Case Study. 38% Handle Time Reduction

Weeks 1 to 3

Discover + MRM scoping

SR 11-7 documentation template aligned with the bank’s model risk committee. Investigator workflow audited live on the SIU floor.

Weeks 4 to 10

Pilot with senior investigators

12 named investigators on real cases. Faithfulness, citation accuracy, override rate tracked weekly. MRM in every retrospective.

Weeks 11 to 14

Production rollout

All 80+ investigators onboarded. Audit trail wired to existing case management. Model card cleared by second-line validators.

Beyond 14

Monitoring and extension

Monthly model performance review. Year-two scope opened to AML and fraud strategy adjacent use cases.

The challenge

A top-10 US bank needed to clear its tier-2 fraud investigation queue faster without weakening investigation quality or failing model risk review. Tier-2 investigators spent a mean of 14 minutes per case. The queue grew about 4% per quarter, tier-3 escalations grew 11%, and the operation handled roughly 1.2M escalations a year. Total tier-2 capacity was the binding constraint on the bank’s entire fraud operations cycle time. Either throughput moved, or the bank kept staffing up to chase a queue that compounded faster than hiring could fill it.

The bank had tried twice before. Both attempts died in Model Risk Management review. The first had no validation framework at all. The second had a challenger model section that ran to eight paragraphs of bullet points, no benchmark, no statistical test against a held-out set. The MRM lead asked for a benchmark in the kickoff meeting of the validation review. The team didn’t have one. The review closed that afternoon.

This is the third attempt. It had to go live through MRM, deliver measurable handle-time improvement with no quality regression, and integrate with the bank’s existing Actimize case management environment plus downstream connections into multiple core systems. Above all, it had to be defensible: SR 11-7 compliant on go-live day, OCC 2011-12 traceable in audit, and explainable to a board-level risk committee that had been burned twice.

The constraints

The build sat inside a tight regulatory and operational envelope. Five constraints shaped every design decision.

SR 11-7 and OCC 2011-12 model risk governance. Every modelling decision had to map back to the bank’s model risk policy. Documentation, validation, ongoing monitoring, and challenger model artefacts all had to be in place at go-live, not promised for a later phase.
Actimize as the system of record. The copilot had to sit alongside Actimize, not replace it. Investigators worked the queue in Actimize; the copilot enriched the case view inside the existing workflow. No new login, no new screen to learn.
Bank VPC, bank KMS, bank IAM. The model ran inside the bank’s network, against the bank’s key management service, with the bank’s identity and access controls. No data left the perimeter. Vendor SaaS LLM endpoints were ruled out in week one.
Human-in-the-loop on every disposition. Federal Reserve guidance and the bank’s own AI policy both required human decision-making on fraud dispositions. The copilot could summarise, retrieve, and propose. It could not decide.
Change-management posture. The tier-2 investigator population had veteran practitioners with strong opinions about prior tooling. The copilot had to win adoption on its merits, not by mandate. Two of the most vocal critics were brought in as co-designers in week six.

Our approach

The numbers

Top-10 US bank · 11-week build · SR 11-7 cleared

38^%

tier-2 fraud investigation handle time reduction

11^wks

build, MRM review included

MRM-flagged incidents in year one

Our approach rested on four principles: treat model risk management as a build constraint, automate the repetitive parts of the investigator workflow with retrieval, run a challenger model from week two, and measure investigation quality as closely as handle time. Five Rockmere consultants worked alongside 14 bank team members across the engagement.

MRM as a first-class build constraint, not an afterthought. Week one, we mapped every SR 11-7 requirement to a specific engagement deliverable. Model documentation, validation plan, challenger model design, monitoring infrastructure, exception handling. All of it sat on the build plan as work to be done, not paperwork to write after the model was finished. The bank’s MRM team joined as engagement stakeholders in week three. They saw every design decision before it was made. The named MRM lead reviewed the validation plan draft in week four and returned comments inside a week. That cadence held for every subsequent artefact.

Retrieval over the investigation knowledge base. We watched 23 tier-2 reviews in the first two weeks. The investigator’s manual workflow had six steps. Read case context, look up applicable fraud policies, search for similar prior cases, summarise findings, propose an investigation plan, execute the plan. The first five were repetitive work that LLMs do well. The sixth was judgment work that LLMs do badly. So we built a RAG architecture that automated steps one through five and left step six untouched. Hybrid retrieval with BM25 for exact-match policy lookups, dense embeddings for the case-similarity search, cross-encoder rerank on the top 50 hits. Weekly evaluation against a held-out set of 200 known dispositions, scored on faithfulness, citation precision, and disposition-similarity to the senior-reviewer ground truth.

Challenger model from week two. SR 11-7 explicitly requires a challenger to validate the primary model’s performance. Most AI initiatives bolt one on at validation time. We built ours in week two: a simpler heuristic baseline that ran in parallel on every case and produced the comparative numbers MRM would need. The challenger never went into production. It existed to give the validation report something real to compare against, and to detect drift in the primary model after go-live.

Investigation quality, not just handle time. A 38% handle-time win that came with a 2-point quality drop would have killed the program. So senior reviewers blind-audited a 5% sample of all dispositions, pre-AI and post-AI, with weekly statistical comparison. We told the bank in week one that we would pull the plug ourselves if the audit numbers moved in the wrong direction. That commitment mattered in week nine, when an early prompt iteration produced a 1.4-point quality dip on a Monday sample. We rolled back the prompt the same day and the numbers recovered by Wednesday. The MRM lead noted the rollback in the weekly review as exactly the kind of operational discipline she had been looking for.

What we delivered

We delivered a production AI fraud investigation copilot, a complete SR 11-7 documentation package, and an operational runbook. The copilot integrated with the bank’s Actimize case management environment and ran inside the bank’s VPC with the bank’s KMS. The system:

Reads the case context, customer profile (subject to access controls), and transaction history
Retrieves applicable fraud policies and the 5 most-similar prior case dispositions
Generates a case summary and proposed investigation plan
Surfaces evidence and policy citations inline (every claim links to source)
Logs every retrieval, every generated output, and every investigator action to the audit system
Provides explicit uncertainty markers when retrieval quality is low

Plus a complete SR 11-7 documentation package: model development documentation, validation report, challenger model documentation, monitoring procedures, model risk classification, exception handling protocol, and the ongoing-monitoring infrastructure to satisfy continuous-validation requirements. The validation report alone ran 84 pages, every section traceable to a specific SR 11-7 paragraph.

Plus the operational runbook. Named on-call rotation between Financial Crimes Technology and the Rockmere team for the 90-day stabilisation window, with a written escalation tree for model anomalies, retrieval-quality regressions, and Actimize integration failures.

The result

The copilot cut tier-2 fraud investigation handle time 38% (from 14.0 to 8.7 minutes), raised daily case disposition rate 57%, and held investigation quality steady, all while clearing SR 11-7 review on the first cycle.

Metric	Baseline	After 90 days production	Change
Tier-2 mean handle time	14.0 min	8.7 min	−38%
Tier-2 daily case disposition rate	24.3 per investigator	38.1 per investigator	+57%
Senior-review audit quality (blind sample)	92.4%	92.7%	unchanged
Customer experience NPS (post-fraud-resolution survey)	31	44	+13 pts
MRM review cycles to approval	2 prior attempts rejected	Single approval cycle	first-pass approval

The system handled 1.2M+ cases in its first year of production with zero MRM-flagged incidents and zero customer escalations attributed to the AI component. The 38% handle-time number is the one the bank’s CFO quoted to the board. The single-cycle MRM approval is the number the Chief Risk Officer quoted. The bank’s MRM lead has since referenced the validation package as a template in two other AI initiatives inside the same institution.

Engagement timeline

Week	Workstream
Week 1	Walked the contact center floor, watched 23 tier-2 reviews, mapped every SR 11-7 requirement to a build deliverable. Pulled 500 prior dispositions for the held-out evaluation set.
Week 2	Built the evaluation harness. Started the challenger model. First draft of the model risk classification memo.
Week 3	RAG architecture in skeleton form, first retrieval results against the eval set. MRM lead joined the standup. Validation report outline approved by MRM.
Weeks 4 to 5	Reranker tuning, prompt iteration on the case-summary step, validation report draft one. OCC 2011-12 traceability matrix completed.
Weeks 6 to 8	Actimize integration. Investigator workflow co-design with six representative tier-2s, two of whom turned out to be the strongest critics and ended up writing the override protocol. Cross-encoder swapped after a faithfulness regression in week 7.
Weeks 9 to 10	Pilot to 14 investigators. Senior-review audit protocol active from day one of pilot. One prompt rollback in week 9.
Week 11	Formal MRM validation review. Single cycle, approved.
Week 12 onward	Phased rollout to full tier-2 organization, ongoing monitoring, weekly drift review.

What survived past our engagement

Four artefacts outlasted the engagement and now belong to the bank.

The SR 11-7 documentation template. The bank’s MRM team adopted our validation report structure as a standard for subsequent AI initiatives. Two adjacent use cases (sanctions investigations, AML transaction monitoring) have since gone live using the same template.
The RAG architecture pattern. The retrieval stack (BM25 + dense + cross-encoder rerank with policy-citation surfacing) became a reusable component inside the bank. The Financial Crimes Technology team maintains it as an internal platform service.
The evaluation harness. The 200-case held-out set is rerun weekly by the bank’s MLOps team. Drift on faithfulness or citation precision triggers a ticket to the named owner in Financial Crimes Technology.
A named owner with budget. The Director of Financial Crimes Technology owns the copilot. Ongoing-monitoring spend sits in her cost centre. The escalation tree is written and tested.

The bank’s MRM lead now sits on the AI Risk Committee that reviews every new AI initiative at the institution. She uses the language from our validation report in the committee’s review criteria. The credential authority that made this engagement possible (named SR 11-7 experience on the Rockmere team) is detailed on our credentials page.

Where this fits

This is canonical for our Financial Services practice. AI built with MRM as a design input, audit trail at SR-grade, and human-in-the-loop preservation for any disposition-level decision. The retrieval architecture is detailed in RAG Systems; the SR 11-7 integration is part of our AI Transformation practice in regulated industries.

For comparison, the same regulatory-first build pattern appears in our State Medicaid eligibility AI case study (NIST AI RMF instead of SR 11-7) and in our SAFe® ART launch at a P&C carrier (NAIC and state DOI evidence integration). If you have an AI initiative blocked at MRM, or one that hasn’t started because of MRM concerns, get in touch. We have specific experience with SR 11-7, OCC 2011-12, and several major US bank MRM frameworks.

Frequently asked

What was the engagement timeline for this bank fraud AI case study?

Eleven weeks from kickoff to first-cycle MRM approval, then a 90-day stabilization window with phased rollout to the full tier-2 organization. The eleven-week clock included a formal SR 11-7 validation review in week 11; we did not extend the build to accommodate the review because the validation artefacts were ready from week three onward.

How did you go live an AI system through SR 11-7 in 11 weeks?

By treating MRM as a first-class build constraint, not as a post-build documentation exercise. Model documentation, validation plan, and challenger model design were deliverables in weeks 2 through 4, parallel to the build. The bank’s MRM team joined as engagement stakeholders by week 3. By the time formal validation review came, the MRM team had been part of the design for 8 weeks.

What tools and model architecture did you use for fraud investigation?

A retrieval-augmented generation pattern over the bank’s investigation knowledge base, prior case dispositions, and applicable policy documents. The LLM (a small/fast model with carefully designed prompting) summarized the case, retrieved comparable prior dispositions, and proposed an investigation plan. The retrieval stack used BM25 for exact-match policy lookups, dense embeddings for case similarity, and a cross-encoder reranker over the top 50 hits. Actimize remained the system of record for case management. We did not use the LLM to make fraud disposition decisions. That remained the investigator’s call.

How was the model validated against SR 11-7 and OCC 2011-12?

A simpler heuristic challenger model ran in parallel on every case from week two onward, producing the comparative numbers MRM required. Senior reviewers blind-audited a 5% sample of all dispositions weekly. The validation report was drafted in week three, not week eleven, and walked through every requirement in SR 11-7 and OCC 2011-12 line by line with the bank’s MRM lead and model validation team.

Did you see false-positive or false-negative impact on fraud outcomes?

Investigation quality (measured by senior reviewer audit sampling) was statistically unchanged at 92.4% pre-AI and 92.7% post-AI. Customer experience improved (faster dispositions on legitimate cases, faster resolution on fraud cases). The handle-time savings came from automation of summary, retrieval, and documentation work, not from changing the investigator’s analytical work.

What survived past the engagement?

Four artefacts. The SR 11-7 documentation template the bank’s MRM team adopted as a standard. The RAG architecture pattern, now reused for sanctions investigations, AML transaction monitoring, and account takeover. The evaluation harness with the 200-case held-out set, run weekly by the bank’s MLOps team. And a named owner in Financial Crimes Technology with budget for the ongoing-monitoring infrastructure.

Bank Fraud AI Copilot: 38% Faster, SR 11-7 Cleared