Skip to content

14.2 Data and storage

OLTP — PostgreSQL

Postgres 15+ (RDS or Aurora PostgreSQL).
Multi-AZ for HA.
Logical replication for CDC to warehouse.
One logical schema per module (acquisition, application, kyc, ingestion, decisioning, lms, …).
Foreign keys + check constraints where they help; avoid where they hurt migration speed.
Indexes: deliberate; track via pg_stat_statements.
Connection pool: HikariCP (Spring Boot default).

Schema discipline

Modular: cross-schema queries discouraged; cross-module reads via service APIs.
Migration tool: Flyway. Migrations versioned in repo; reversible where safe; non-reversible (data) clearly flagged.
Naming convention: snake_case for tables / columns; consistent ID suffix (_id).

Event sourcing (for accounting)

loan_event table is append-only; current state derived.
Use outbox pattern for publishing to event bus.

Object storage — S3

Versioned buckets for documents.
Object Lock (Compliance mode) for evidence packages — 8 – 10 year retention.
Lifecycle rules: move to Glacier after 1 year for cold artefacts.
Server-side encryption with KMS-managed keys.

Cache — Redis

ElastiCache Redis Multi-AZ.
Used for:
- Decision-engine policy cache.
- Bureau report cache (TTL-based per policy).
- Rate limiting counters.
- Session / short-lived auth tokens.

Search — OpenSearch

Document and case search.
Audit-log indexing.
Free-text borrower / case search by ops users.
Operational log indexing if not using Loki.

Event bus

MVP: RabbitMQ (managed if available). Simpler, lower ops overhead.
Scale: Kafka / MSK for higher throughput, partitioning, ordering.
Schema registry: AWS Glue Schema Registry or Confluent.

Warehouse

MVP: PostgreSQL read replica + scheduled extracts; sufficient for < 10M rows / table.
Scale: Snowflake or ClickHouse — see 5.6 for trade-offs.

ETL

CDC with Debezium (PostgreSQL → Kafka → warehouse).
dbt for SQL transformations Bronze → Silver → Gold.
Airflow for any complex pipelines / one-off batches.

BI

Metabase at MVP (open source; easy).
Superset at scale.
Looker / Tableau if enterprise rigour justified.

Backup and recovery

Daily snapshots of all databases.
Point-in-time recovery for Postgres (5 – 7 day window).
Cross-region replication of backups to Indian DR region.
Tested restoration quarterly.

Data archival

Active operational data in OLTP.
Historical data (> 3 years) archived to S3 (compressed parquet).
Restoration path documented.

Reference data

Static reference tables (NIC codes, IFSC list, GST states) maintained in DB; refreshed on schedule.