CEO, Sort String Solutions LLP
The conversation that turned five years of SalesPort data into a productised AI module happened on a Wednesday afternoon in March 2026. We had run a demand-forecasting prototype against six months of Pawanshree Dairy's distribution data — SKU-level dispatch volumes across their 140 collection routes, trained on weather, day-of-week, festival calendar, and primary-sales-velocity features. The model beat their previous manual forecast by 22% on MAPE for fast-moving SKUs.
The MD looked at the dashboard, paused for about 10 seconds, and said: "We will love to pay extra for this."
That single sentence converted "we have data" from an internal capability into a productised module set. This post walks through what that means in practice — what the data actually looks like, why classical ML beats LLM-everything for this problem, and how it ships to clients.
## The data already exists — that's the wedge
The deepest moat in any AI-for-FMCG product is not the model architecture. It's the data depth. Most standalone AI vendors targeting Indian FMCG (Aforza, Wiz.ai, RELEX, o9 Solutions) spend the first 6-12 months of an engagement just getting clean operational data flowing — fixing schema mismatches, deduplicating orders, reconciling primary vs secondary, untangling scheme attribution.
For SalesPort clients, that data is already flowing. Every order, every dispatch, every scheme application, every GPS ping from a salesperson's phone, every farmer milk-collection event — already structured, already validated, already in a database we control.
Across our 45 deployments:
- 49 Lakh+ orders with 1.96 Crore order line items
- 11.44 Lakh dispatches with full vehicle + GPS + delivery traces
- 17.43 Lakh schemes auto-applied with slab tier + claim window + price impact
- ₹2,677 Crore of payment flows across distributor and retailer wallets
- 21.64 Crore GPS data points from 2.3 Lakh daily active mobile users
- ₹803 Crore of milk procurement across 83,785 farmer accounts
That depth — and the fact that it's per-client isolated, audit-trailed, and structurally clean — is what makes the AI modules ship in weeks, not quarters.
## Why XGBoost, not GPT-4
A common question we get from technical buyers: are you using LLMs for this?
For demand forecasting, no. The problem shape is structured numerical prediction — given SKU + retailer + day-of-week + weather + holiday-calendar + promotional-status, predict next-week dispatch volume. Classical gradient-boosted regression (XGBoost) handles this better than any LLM-based approach. The features are numeric, the target is numeric, the relationships are mostly local and non-linear, and the data is tabular.
For the WhatsApp order bot — where retailers text natural-language orders to the brand's WhatsApp Business number — yes, we use LLMs (specifically Anthropic Claude via API). The problem there is natural language understanding, which is genuinely what LLMs are good at.
The principle: use LLMs where they add value, classical ML where it does better. Most AI-for-FMCG buzz mixes these up. We don't.
## What the Pawanshree forecast actually looks like
The Pawanshree deployment runs daily SKU-level forecasts at 30/60/90-day horizons across 140 collection routes. Each route has 6-12 active SKUs. The forecast updates every night at 2 AM after the previous day's actuals close.
The dashboard shows three lines per SKU per route:
1. Yesterday's actual — what actually dispatched 2. Forecast for today — what the model predicts 3. Forecast variance — the gap between forecast and actual for the most recent 7 days
The operations team uses the forecast for two decisions: (a) procurement planning at the plant level (how much raw milk to receive), and (b) route-level dispatch sizing (so cold trucks don't run half-empty or under-loaded).
Initial pilot improvement was 22% MAPE reduction on fast-moving SKUs and 14% on slow-movers. After six months of production use the model has retrained on the new data and the lift has grown modestly — fast-moving SKUs now at 28% MAPE reduction, slow-movers at 18%.
## The productisation — what every SalesPort client gets
The Pawanshree custom build became the template for what every SalesPort client gets at a +₹25K/month add-on. The deployment is configuration-driven, not a new engineering project per client. The model retrains automatically on each client's data; no client's data leaks into another's model.
Three deployment phases:
1. Data validation (week 1) — confirm 12+ months of clean data, check for SKU master alignment, validate scheme attribution 2. Initial training (week 2) — fit XGBoost models per major SKU cluster, validate against held-out test sets, surface variance warnings to the client's ops team 3. Production rollout (weeks 3-4) — wire the forecasts into the Live Sales Analytics Dashboard, train the client's ops team on interpreting forecast variance
For clients with less than 12 months of SalesPort data, the forecasting module can still light up at lower confidence — typically MAPE reduction of 10-15% on fast-movers. The 22%+ numbers from Pawanshree come from deeper historical depth.
## What ships next
Demand forecasting is the first of seven AI modules. The roadmap:
- Q2 2026 (now): Demand Forecasting + Live Sales Analytics Dashboard
- Q3 2026: Route Optimisation + Trade Promotion ROI Engine + WhatsApp Order Bot
- Q4 2026: Image Recognition (Perfect Store) for shelf audits
- Q1 2027: Distributor Credit Scoring
The pattern is the same across all of them — productise data the client already owns, ship as an add-on to existing SalesPort AMC, light up in weeks instead of quarters.
The wedge that the Pawanshree MD opened with "we will love to pay extra for this" is the same wedge for every existing SalesPort client. Data they're already generating, productised into AI modules they actually use.
Frequently Asked Questions
