Skip to content

6.7 MVP vs advanced scorecard; ML roadmap

At MVP — first 12 – 18 months, first ~5,000 – 10,000 loans — the scorecard is fully deterministic and rule-based.

  • Bureau score (4 CICs; worst-of for floor).
  • GST scorecard outputs (turnover, consistency, concentration, recon).
  • BSA scorecard outputs (ABB, turnover, bounces, FOIR).
  • Cash-flow analysis (DSC, working-capital cycle).
  • Tally scorecard outputs (where available — accelerator, not gating at MVP).
  • Fraud signals.
  • Each scorecard outputs a sub-grade (A / B / C / D).
  • Worst-of (or weighted-average) determines overall risk grade.
  • Pricing, exposure, tenure follow grid lookups.
  • Explainable — every decision has a clear reason.
  • Auditable — engineer can trace exactly why approve / decline.
  • Easy to evolve — change a threshold in admin console; sandbox test; deploy.
  • Comply with FPC — borrower decline reason is straightforward.
  • Not optimal — rules don’t capture interactions between features.
  • Static — calibration to changing market behaviour lags.
  • High false-decline rate — borrowers who would have paid are declined due to one weak dimension.
  • High false-approve rate at the margin — clean files with hidden risk slip through.

At phase 4+ (10,000+ loans, defined defaults observed, mature data platform), evolve to a model-based scorecard with rule-based guardrails.

A typical SME WC model uses 100 – 500 engineered features:

  • Bureau-derived: score, DPD vintage, enquiry velocity, write-off history, mix of secured / unsecured.
  • GST-derived: turnover, growth, volatility, seasonality, customer concentration, filing recency.
  • BSA-derived: ABB, MAB, credit / debit turnover, EMI obligations, bounce count, balance trend, cash deposit ratio, related-party transfer ratio, salary vs business credit identification.
  • Cash-flow: net cash flow, DSC, volatility.
  • Tally-derived (where present): revenue, margin, ageing, inventory turn, debtor concentration.
  • Behavioural: time-of-day pattern, device, channel.
  • Anchor-derived (SCF): months with anchor, transaction count, dispute count, return rate.
  • Repeat-borrower: prior loan performance, prior limit utilisation, cure history.
  • Gradient boosting (XGBoost, LightGBM) is the standard for tabular credit data in India.
  • Logistic regression as a transparent baseline.
  • Neural networks rarely justify the complexity for tabular SME data.
  • Survival models for tenor-aware default probability.
  • Probability of default (90+ DPD within 12 months).
  • Probability of cure (if 30/60 DPD, will it cure).
  • Expected loss given default (LGD).
  • Expected exposure at default (EAD) — for revolving lines.
  • Rules remain as hard cuts (regulatory, fraud, exposure caps).
  • Model produces a score; bucketed into grades; mapped to pricing.
  • Model-and-rule together: model can’t override hard cuts; rule can override model approve to decline; rule can’t override model decline to approve without explicit deviation.

When introducing a new model or policy:

  • Shadow run: new policy / model runs alongside production; outcomes captured but production decision is taken by the incumbent.
  • A/B: a defined 5 – 20% of new applications routed to the challenger.
  • Vintage measurement: track outcomes over time; compare default rates, approval rates, profit per loan.
  • Promotion: challenger becomes champion if performance better and stable for a defined observation period.

Per RBI Fair Practices Code and DPDP good-practice, model-based decisions must be explainable to the borrower (in suitable terms).

  • Feature attribution at decision time (SHAP values cached per decision).
  • Top-3 contributing factors in human-readable form for borrower-facing decline letter.
  • Engineering / risk team sees full feature-attribution for analysis.
  • Population stability index (PSI) for input drift.
  • Calibration — predicted probability vs realised default.
  • Discrimination (Gini, KS) on vintaged cohorts.
  • Decision distribution — approval rate by grade, by channel, drift detection.
  • Fairness — outcomes by geography / borrower-segment; ensure no inadvertent discrimination.
  • Build feature store (Feast or homegrown) so the same feature definition feeds training and production.
  • Build model registry (MLflow) for model versioning.
  • Build deployment pipeline that promotes from staging to production via champion-challenger.
  • Build monitoring pipeline that runs daily.
PhaseState
Phase 1 (MVP, year 1)Rule-based with worst-of grading; manual review for refer cases.
Phase 2 (year 1.5)Refine rules based on first cohorts; introduce tighter scorecards.
Phase 3 (year 2)Logistic baseline as challenger; champion-challenger run.
Phase 4 (year 2.5)Gradient-boosting model in production for approval-side scoring; rules retained for cuts.
Phase 5 (year 3+)Multi-model ensembles; segment-specific models; ECL / LGD models for IRR-based pricing.

The honest reality: many lenders never need to graduate past Phase 2 rule-based scorecards because the wedge they win on isn’t model superiority but distribution + execution. Don’t over-invest in ML before the rule engine is well-instrumented and tuned.