5.1 High-level architecture
The picture
Section titled “The picture”The platform stack from channel UIs down to data:
┌──────────────────────────────────────────────────────────┐│ CHANNEL UIs ││ Borrower web/app | DSA portal | CA portal | Partner ││ Anchor portal | Admin console | Field-agent app │└─────────┬────────────────────────────────────────────────┘ │ HTTPS, OAuth2, MFA┌─────────▼────────────────────────────────────────────────┐│ EDGE ││ CDN | WAF | API Gateway | Rate limiter | Request log │└─────────┬────────────────────────────────────────────────┘ │┌─────────▼────────────────────────────────────────────────┐│ BFF / GraphQL or REST for channel UIs ││ Per-channel BFFs for borrower / partner / admin │└─────────┬────────────────────────────────────────────────┘ │┌─────────▼────────────────────────────────────────────────┐│ DOMAIN SERVICES (modular monolith) ││ Acquisition | Application | KYC | Ingestion | ││ Decisioning | Manual review | Docs | Disbursement | ││ LMS | Collections | Monitoring | Accounting | ││ Reporting | Admin | Notification | Co-lending | ││ Settlement │└─────────┬────────────────────────────────────────────────┘ │ REST + async events┌─────────▼────────────────────────────────────────────────┐│ INTEGRATION LAYER (vendor adapters) ││ Bureau | KYC | AA | BSA | GST | Tally | eSign | ││ eStamp | NACH | UPI | PG | Payout | Sponsor bank | ││ SMS / WA / Email / IVR / Dialer | Field-app push │└─────────┬────────────────────────────────────────────────┘ │┌─────────▼────────────────────────────────────────────────┐│ DATA LAYER ││ PostgreSQL (OLTP) | Redis (cache) | Object store (S3) ││ Kafka / RabbitMQ (events) | OpenSearch | Warehouse + ││ dbt | Vault / KMS │└──────────────────────────────────────────────────────────┘Each layer has one job. Each layer can be scaled or replaced independently.
Deployment shape
Section titled “Deployment shape”- Cloud: AWS Mumbai (
ap-south-1) primary across2 – 3AZs; AWS Hyderabad (ap-south-2) DR with continuous replication. - Compute: Kubernetes (EKS) for stateless services. Aurora PostgreSQL Multi-AZ for OLTP. ElastiCache (Redis) Multi-AZ.
- Event bus: RabbitMQ at MVP, MSK (managed Kafka) at scale (see 5.4).
- Object storage: S3 with versioning + Object Lock on evidence buckets.
- Search: OpenSearch for log indexing + free-text search of cases / documents.
- Warehouse: Snowflake or ClickHouse at scale; PostgreSQL replica at MVP (see 5.6).
- CDN: CloudFront / Cloudflare for portals and static assets.
- Observability: Datadog OR Prometheus + Grafana + Loki + Tempo.
- Secrets: AWS Secrets Manager + KMS, or HashiCorp Vault.
Alternative clouds (Azure Pune/Mumbai, OCI Mumbai/Hyderabad, GCP Mumbai) are valid; the architectural shape is portable.
Multi-AZ + DR per RBI IT MD
Section titled “Multi-AZ + DR per RBI IT MD”The RBI IT MD (2.13) expects geographic separation between primary and DR.
- Primary:
ap-south-1across2 – 3AZs. - DR:
ap-south-2(Hyderabad). - RPO:
<= 15 minutesfor OLTP (continuous CDC + cross-region replication). - RTO:
<= 4 hoursfor full system recovery. - DR drill: quarterly for critical systems; annual end-to-end.
- Runbook: documented per failure scenario; tested in drills.
Request flows (canonical examples)
Section titled “Request flows (canonical examples)”Flow A — Borrower applies via own portal
Section titled “Flow A — Borrower applies via own portal”- Borrower lands on web/app → mobile-OTP authentication.
- Borrower UI calls Borrower-BFF.
- BFF calls Application service → creates
Application(statusdraft). - Borrower fills KYC, gives AA + GST + bureau consents; Ingestion pulls data in parallel.
- Borrower submits → Decisioning runs full engine → returns
APPROVE/DECLINE/REFER. - On
APPROVE: Sanction created → Docs service generates KFS + agreement. - Borrower acknowledges KFS, eSigns multi-signer; eStamp issued.
- NACH mandate activates (
eNACHvia Aadhaar). - Pre-disbursement checklist runs → Disbursement service triggers payout via sponsor-bank API → UTR captured.
- LMS activates loan account → daily accrual begins → classification job runs end-of-day.
See 5.8 Sequence diagrams for the full ladder.
Flow B — DSA-assisted application
Section titled “Flow B — DSA-assisted application”- DSA logs into DSA portal (SSO via internal IDP).
- DSA-BFF wraps the same Application service but with DSA-attribution metadata captured upfront.
- Borrower receives consent links via SMS / WhatsApp; completes KYC / AA from their own phone.
- Remainder same as Flow A.
- On disbursement, DSA payout accrual is created and visible on the DSA portal.
Flow C — Co-lent loan (CLM-1, single partner)
Section titled “Flow C — Co-lent loan (CLM-1, single partner)”- Application + decision as Flow A; decision engine runs both originator’s and partner’s policy in parallel.
- On dual approve, Co-lending Allocation service splits the loan (
80:20partner:NBFC by default). - Sanction reflects both lenders; KFS discloses both.
- Disbursement coordinates with sponsor-bank escrow: partner funds released → originator funds released → combined amount to borrower with single UTR.
- LMS books two share-level ledgers + one consolidated borrower ledger.
- Daily / weekly settlement service moves funds from escrow to each lender per agreement.
- NPA classification (when triggered) updates both lenders in lockstep.
See 7. Co-lending deep dive and 5.8 Sequence diagrams for full mechanics.
Flow D — Repayment via NACH
Section titled “Flow D — Repayment via NACH”- End-of-day, LMS generates the NACH presentation file for tomorrow’s due dates.
- NACH adapter pushes file to sponsor bank via SFTP at the cut-off.
- Sponsor bank submits to NPCI; NPCI processes overnight.
- Sponsor bank returns ack file next day with success / bounce per row.
- LMS processes each result:
- Success: repayment recorded → waterfall allocation (penal → fees → interest → principal) → events emitted.
- Bounce: bounce fee applied → re-presentation per NACH rules → if persistent, case enters collections queue.
- Reconciliation engine matches sponsor-bank statement against expected; exceptions queue.
Trust boundaries
Section titled “Trust boundaries”- Public internet ↔ edge: WAF + rate limit + mTLS for partner APIs.
- Edge ↔ internal services: internal network; mTLS recommended for sensitive paths.
- Services ↔ external vendors: outbound API gateway / egress proxy with allowlist; centralised egress logging.
- Service ↔ database: IAM-authenticated; least-privilege; no shared admin credentials.
High availability
Section titled “High availability”- Stateful: Multi-AZ (Aurora, ElastiCache).
- Stateless: autoscaling; minimum N+1 capacity.
- Circuit breakers on every vendor call (Resilience4j).
- Bulkheads: separate thread pools / connection pools per critical external system to prevent cascading failure.
- Idempotency keys on every mutating external call.
- Outbox pattern for at-least-once event publishing.
Data localisation
Section titled “Data localisation”Per RBI Payment System Data Storage direction, AA framework, KYC MD, and IT MD:
- All borrower data physically in Indian regions.
- Cross-region backups only to the Indian DR region.
- No vendor that requires data egress out of India is used for regulated data flows.
- DR plan documented; cross-region tested.
Why this shape (not microservices from day 1)
Section titled “Why this shape (not microservices from day 1)”A pure microservices architecture on day 1 is a mistake for an early NBFC engineering team. Reasons:
- Operational overhead — 15 – 20 services from day 1 means 15 – 20 deployment pipelines, observability dashboards, on-call rotations.
- Distributed-transaction complexity — multi-service consistency requires either sagas (workflow engine) or eventual consistency (complex to reason about for accounting flows). The modular monolith keeps these as DB transactions until necessary.
- Refactoring friction — module boundaries take time to settle; in a monolith refactors are cheap; across microservices they are expensive.
- Team velocity — until the platform is shipped and observed, splitting services starves business velocity for premature engineering optimisation.
The right shape: modular monolith with strict module boundaries, with selective extraction once specific scaling / regulatory / partner-isolation triggers fire. See 5.2 Services and modules for the extraction criteria.