CPG Demand Planning AI Case Study. 11-Point MAPE Improvement

Weeks 1 to 3

Value stream map of S&OP

Mapped twelve queue points in the demand planning cycle. Forecast accuracy baseline measured per SKU category.

Weeks 4 to 10

Forecasting model pilot

Built and instrumented against real SKU history. MAPE measured weekly per category. Planners review every Monday.

Weeks 11 to 14

Tier huddle around S&OP

Tier 2 cadence on the weekly planning board. Demand signal and supply response linked in the same room.

Beyond 14

Sustain

Planners run the forecast every Monday morning. MAPE cut by 11 points; $40M of working capital freed. Quarterly advisory only.

The challenge

A top-10 US CPG manufacturer was losing $40M+ in working capital to forecast error its SAP IBP statistical model could not fix. The forecast was wrong by 24% on average for every SKU-store-week pair, every week, and the error followed structural patterns the existing model never captured.

The manufacturer ran demand planning on SAP IBP across about 1,200 SKUs distributed to 50,000 retail outlets. Forecasts were generated at SKU-store-week. MAPE at that granularity sat at 24 points. On average, the forecast was wrong by 24% for every SKU-store-week pair, every week. The error was not random; it had structural patterns that the existing statistical forecast inside IBP was not equipped to capture.

The cost was visible in the balance sheet. Safety stock was elevated across the network to cover the error. Working capital tied up in inventory ran $180M above the network-optimised level. Service-level penalties from key retailers averaged $14M a year. Promotional execution suffered because the forecast couldn’t reliably separate baseline demand from promotional lift; the planners were running promo math in spreadsheets on the side. The audit committee had noticed the working-capital line in the prior annual report.

A prior consulting engagement had produced an “AI demand model” delivered as a Jupyter notebook. It ran on a workstation. It was not connected to IBP, not deployed to production, and had no monitoring. The data scientist who built it had taken another job. The notebook had been quietly abandoned by the time we walked in.

We productionised the CPG demand planning forecasting service in 16 weeks, cutting MAPE 11 points at SKU-store-week granularity, integrating with SAP IBP through the standard interface, and delivering $40M+ in annualised working-capital impact independently measured by the client’s finance team.

The constraints

The build sat inside an enterprise planning environment with real audit and integration obligations.

SAP IBP as the system of record. Planners worked in IBP. Supply, procurement, and finance read forecasts from IBP. The AI model could augment IBP but could not replace it. Integration ran through the standard forecast-import interface, no custom middleware.
SOX control environment. Demand forecasts feed inventory valuation and revenue planning, both SOX-relevant. Model changes, version promotion, and forecast generation runs had to be auditable. The internal controls team had to sign off on the change-governance design.
24 months of clean training data, more if dirty. Promotional flags were inconsistent across business units. Store-cluster definitions changed twice in the window. Multi-cultural holiday calendars were missing for two key store clusters. Feature engineering had to repair the data, not just consume it.
Planner trust deficit from the prior notebook. The data science team had burned credibility with the demand planning organisation. Any new model had to earn adoption through workflow fit, not through accuracy claims in a slide deck.
Data engineering team takes ownership at week 16. The client’s data engineering team would operate the service after handoff. Architecture decisions had to fit their existing platform (Databricks plus Snowflake), their existing on-call rotation, and their existing deployment pipeline.
No new capital spend. The budget was operating expense. We had to use the client’s existing platforms, existing licences, and existing data lake.

Our approach

The numbers

Top-10 US CPG manufacturer · IBP-integrated demand model

11^%

forecast accuracy gain at SKU level

7^%

inventory reduction

16^wks

build to IBP integration

Production-grade from week one. We declined to build a notebook. The model would run as a production service, integrate with IBP via the standard forecast-import interface, run on a defined schedule, and have monitoring. We wrote that architecture decision on a whiteboard in the kickoff and didn’t move off it. No modeling work began until the runtime, deployment, and IBP interface were settled. The same architecture-first discipline we apply in regulated-industry builds, like the bank fraud investigation copilot case study where MRM was the constraint, applied here with SOX as the constraint.

Granularity that matched the unit of decision. The client’s prior model forecast at SKU-region-week. Inventory is not held at SKU-region-week. It is held at SKU-store-week. We rebuilt at the unit of decision. The accuracy improvement was substantial just from that change, before any feature engineering.

Feature engineering over architecture sophistication. Forty-plus engineered features: hierarchical (brand, category, store cluster, region, season), promotional flags with the correct lead and lag windows, weather signals at store-region level pulled from NOAA, holiday calendars at multi-cultural granularity (the client had been missing Lunar New Year and Diwali lift in two of its store clusters for years), and competitor promotional intelligence where the client had it. The model architecture (LightGBM) is well-understood. The features are where the win came from. Two features alone (the multi-cultural holiday calendar and the NOAA weather pull) accounted for about 4 of the 11 MAPE points.

Planner workflow integration as a paired workstream. A demand model that improves accuracy but doesn’t change planner behaviour captures none of the value. We worked with the demand planning organisation to redesign the weekly cadence: what planners review, what they override, what they trust the model on. The cadence redesign was a separate workstream paired with the technical build, not an afterthought. This is where the Lean Consulting discipline shows up in an AI engagement: the management system around the model matters as much as the model.

SOX governance built into the deployment pipeline. Controlled promotion between dev, staging, and production. Signed approvals on every model version. Full audit log of forecast generation runs. The internal controls team reviewed the design in week eight and accepted it in week ten. By the time finance signed off on the working-capital methodology in week fourteen, SOX had already approved the model-change governance.

What we delivered

A production demand forecasting service that:

Generates SKU-store-week forecasts on a weekly schedule
Integrates with SAP IBP via the standard forecast-import interface
Provides confidence intervals at the forecast level
Surfaces feature attributions so planners understand why a forecast moved
Logs every forecast generation with model version, feature data, and resulting predictions for audit and back-test purposes
Monitors forecast accuracy continuously and flags drift to the planning organisation

The planner workflow piece, including the redesigned weekly cadence, the override protocol, and the management dashboards, was a separate workstream paired with the technical build.

Plus the SOX change-governance pattern: documented promotion gates, signed approvals on every model version, audit logs reviewed by the internal controls team quarterly. The pattern was accepted by the external auditor in the next annual cycle without findings.

Plus the operational runbook for the data engineering team taking ownership. Named on-call rotation, escalation tree for forecast generation failures, retraining cadence (quarterly), drift-alert thresholds with named owners for response.

The result

Metric	Baseline	After 6 months production	Change
MAPE at SKU-store-week	24 pts	13 pts	−11 pts
Safety stock requirement (network)	$180M above optimum	$140M above optimum	−$40M
Service-level penalties (annualized)	$14M	$9.1M	−$4.9M
Planner override rate	n/a (no model)	12% (declining)	adoption healthy
Total annualized working-capital impact	n/a	n/a	$40M+

The financial impact (working capital plus service-level penalty plus obsolescence reduction) was measured by the client’s finance team independently of Rockmere. The CFO presented the number to the board in the quarter following the 6-month mark. The audit committee, which had noticed the working-capital line a year earlier, noted the reduction in the next annual review.

Engagement timeline

Week	Workstream
Weeks 1 to 2	Architecture decision, data audit, baseline MAPE measurement against the existing IBP forecast. Found the granularity gap on day three. Internal controls team briefed on SOX implications of model change governance.
Weeks 3 to 6	Feature engineering and model training. Two engineered features (multi-cultural holiday calendar and NOAA weather) accounted for about 4 of the 11 MAPE points on their own.
Weeks 7 to 9	IBP integration, production deployment pipeline, monitoring. The IBP forecast-import interface was older than expected; a planner who had run IBP for nine years walked us through the format quirks in an afternoon.
Weeks 10 to 13	Planner workflow redesign with the demand planning organisation. Watched four planners run their weekly cadence. Found that two of them were already overriding the statistical forecast in a spreadsheet they kept hidden from the planning lead. We made that workflow legitimate. SOX change-governance design accepted by internal controls team.
Weeks 14 to 16	Hardening, planner training, operational handoff to the client’s data engineering team. Finance signed off on the working-capital measurement methodology.

What survived past our engagement

Five artefacts now belong to the client.

The production model service. Operated by the client’s data engineering team on the existing Databricks/Snowflake platform. Retrained quarterly. Drift alerts wired to on-call.
The feature engineering library. Now the client’s internal demand-modelling standard. Reused on the trade promotion optimisation engagement nine months later.
The planner workflow patterns. Weekly cadence, override protocol, management dashboards. Documented and used to onboard new planners.
The SOX change-governance pattern. Documented promotion gates, signed approvals, audit logs. Reused on subsequent AI initiatives across the supply chain organisation.
A named owner with budget. The VP of Supply Chain Planning owns the service. The data engineering team has a named tech lead. Retraining spend sits in the supply chain operating budget.

A second engagement followed 9 months later for a similar build on the client’s trade promotion optimisation workflow, drawing on the same feature engineering library and the same SOX change-governance pattern. The credential authority that lets us deliver to this depth (named senior practitioners, IBP and Databricks production experience) is detailed on our credentials page.

Where this fits

This engagement is canonical for our Manufacturing practice. The model and integration work was AI Transformation; the planner workflow redesign drew on Lean Consulting for the daily and weekly cadence patterns.

The same management-system-around-the-model pattern shows up in our Tier-1 automotive supplier OEE case study, where Lean huddles were how the AI predictions actually changed behaviour. The same regulator-first build discipline shows up in our State Medicaid eligibility AI case study, where NIST AI RMF was the gating regime instead of SOX.

If you’re running demand planning on SAP IBP, Anaplan, Blue Yonder, or o9 and your forecast accuracy is limiting working capital efficiency, get in touch. We can usually estimate the achievable MAPE improvement from a 30-minute data overview call.

Frequently asked

What was the engagement timeline for this CPG demand planning AI case study?

Sixteen weeks from architecture decision to operational handoff. The MAPE improvement was visible within the first production cycle. The working-capital impact lagged roughly one full demand cycle (12 to 16 weeks) as upstream procurement and production adjusted to better signals. The $40M figure is the annualised run-rate after 6 months of production operation.

What tools and model architecture did you use?

Gradient-boosted trees (LightGBM) at SKU-store-week granularity, with hierarchical features (brand, category, store-cluster, region, season). The production service ran on the client’s existing Databricks platform, wrote forecasts to a Snowflake table, and pushed into SAP IBP via the standard forecast-import interface. We benchmarked deep learning approaches (TFT, N-BEATS) but the gradient-boosted approach matched accuracy at significantly lower operational cost. For demand forecasting, model architecture matters less than feature engineering and the right granularity.

Did this replace SAP IBP?

No. The AI model integrates with IBP as a forecast input. Planners review the AI forecast in IBP alongside the statistical forecast, can override at any level, and use IBP for the rest of the planning workflow. We did not displace the enterprise planning system.

How was the model validated and monitored?

Hold-out validation against 24 months of prior demand data, sliced by SKU-store-week to match the unit of decision. Continuous post-deployment monitoring tracked MAPE, bias, and override-rate by planner segment. Drift alerts fed the data engineering team’s on-call rotation. The financial impact (working capital plus service-level penalty plus obsolescence reduction) was measured by the client’s finance team independently of Rockmere, which the SOX control owner accepted as adequate evidence for the change.

How did you address SOX and audit obligations?

Demand forecasts feed inventory valuation and revenue planning, both SOX-relevant. We worked with the client’s internal controls team to design model-change governance (controlled promotion between dev/staging/production, signed approvals on every model version, full audit log of forecast generation runs). The model is a SOX-relevant system; it was treated as one from week one.

What survived past the engagement?

The production model service itself, operated by the client’s data engineering team. The feature engineering library, now the client’s internal demand-modelling standard. The planner workflow patterns (weekly cadence, override protocol, management dashboards). The SOX change-governance pattern, reused on subsequent AI projects. And a named owner in supply chain planning with budget for ongoing retraining.

How long does the model need to be in production to deliver $40M working capital impact?

The MAPE improvement is immediate. The working-capital impact lags the model by roughly one full demand cycle (12 to 16 weeks) as upstream procurement and production adjust to better signals. The $40M figure is the annualized run-rate after 6 months of production operation.

CPG Demand Planning AI: 11-Point MAPE Cut, $40M Freed