6.7 MVP vs advanced scorecard; ML roadmap

MVP scorecard (deterministic, rule-based)

At MVP — first 12 – 18 months, first ~5,000 – 10,000 loans — the scorecard is fully deterministic and rule-based.

Inputs

Bureau score (4 CICs; worst-of for floor).
GST scorecard outputs (turnover, consistency, concentration, recon).
BSA scorecard outputs (ABB, turnover, bounces, FOIR).
Cash-flow analysis (DSC, working-capital cycle).
Tally scorecard outputs (where available — accelerator, not gating at MVP).
Fraud signals.

Mechanism

Each scorecard outputs a sub-grade (A / B / C / D).
Worst-of (or weighted-average) determines overall risk grade.
Pricing, exposure, tenure follow grid lookups.

Pros

Explainable — every decision has a clear reason.
Auditable — engineer can trace exactly why approve / decline.
Easy to evolve — change a threshold in admin console; sandbox test; deploy.
Comply with FPC — borrower decline reason is straightforward.

Cons

Not optimal — rules don’t capture interactions between features.
Static — calibration to changing market behaviour lags.
High false-decline rate — borrowers who would have paid are declined due to one weak dimension.
High false-approve rate at the margin — clean files with hidden risk slip through.

Advanced scorecard (model-based)

At phase 4+ (10,000+ loans, defined defaults observed, mature data platform), evolve to a model-based scorecard with rule-based guardrails.

Inputs (engineered features)

A typical SME WC model uses 100 – 500 engineered features:

Bureau-derived: score, DPD vintage, enquiry velocity, write-off history, mix of secured / unsecured.
GST-derived: turnover, growth, volatility, seasonality, customer concentration, filing recency.
BSA-derived: ABB, MAB, credit / debit turnover, EMI obligations, bounce count, balance trend, cash deposit ratio, related-party transfer ratio, salary vs business credit identification.
Cash-flow: net cash flow, DSC, volatility.
Tally-derived (where present): revenue, margin, ageing, inventory turn, debtor concentration.
Behavioural: time-of-day pattern, device, channel.
Anchor-derived (SCF): months with anchor, transaction count, dispute count, return rate.
Repeat-borrower: prior loan performance, prior limit utilisation, cure history.

Models

Gradient boosting (XGBoost, LightGBM) is the standard for tabular credit data in India.
Logistic regression as a transparent baseline.
Neural networks rarely justify the complexity for tabular SME data.
Survival models for tenor-aware default probability.

Targets

Probability of default (90+ DPD within 12 months).
Probability of cure (if 30/60 DPD, will it cure).
Expected loss given default (LGD).
Expected exposure at default (EAD) — for revolving lines.

Combination with rules

Rules remain as hard cuts (regulatory, fraud, exposure caps).
Model produces a score; bucketed into grades; mapped to pricing.
Model-and-rule together: model can’t override hard cuts; rule can override model approve to decline; rule can’t override model decline to approve without explicit deviation.

Champion-challenger framework

When introducing a new model or policy:

Shadow run: new policy / model runs alongside production; outcomes captured but production decision is taken by the incumbent.
A/B: a defined 5 – 20% of new applications routed to the challenger.
Vintage measurement: track outcomes over time; compare default rates, approval rates, profit per loan.
Promotion: challenger becomes champion if performance better and stable for a defined observation period.

Explainability

Per RBI Fair Practices Code and DPDP good-practice, model-based decisions must be explainable to the borrower (in suitable terms).

Feature attribution at decision time (SHAP values cached per decision).
Top-3 contributing factors in human-readable form for borrower-facing decline letter.
Engineering / risk team sees full feature-attribution for analysis.

Model monitoring

Population stability index (PSI) for input drift.
Calibration — predicted probability vs realised default.
Discrimination (Gini, KS) on vintaged cohorts.
Decision distribution — approval rate by grade, by channel, drift detection.
Fairness — outcomes by geography / borrower-segment; ensure no inadvertent discrimination.

Data and ML platform

Build feature store (Feast or homegrown) so the same feature definition feeds training and production.
Build model registry (MLflow) for model versioning.
Build deployment pipeline that promotes from staging to production via champion-challenger.
Build monitoring pipeline that runs daily.

Roadmap

Phase	State
Phase 1 (MVP, year 1)	Rule-based with worst-of grading; manual review for refer cases.
Phase 2 (year 1.5)	Refine rules based on first cohorts; introduce tighter scorecards.
Phase 3 (year 2)	Logistic baseline as challenger; champion-challenger run.
Phase 4 (year 2.5)	Gradient-boosting model in production for approval-side scoring; rules retained for cuts.
Phase 5 (year 3+)	Multi-model ensembles; segment-specific models; ECL / LGD models for IRR-based pricing.

The honest reality: many lenders never need to graduate past Phase 2 rule-based scorecards because the wedge they win on isn’t model superiority but distribution + execution. Don’t over-invest in ML before the rule engine is well-instrumented and tuned.