Skip to content

14.2 Data and storage

  • Postgres 15+ (RDS or Aurora PostgreSQL).
  • Multi-AZ for HA.
  • Logical replication for CDC to warehouse.
  • One logical schema per module (acquisition, application, kyc, ingestion, decisioning, lms, …).
  • Foreign keys + check constraints where they help; avoid where they hurt migration speed.
  • Indexes: deliberate; track via pg_stat_statements.
  • Connection pool: HikariCP (Spring Boot default).
  • Modular: cross-schema queries discouraged; cross-module reads via service APIs.
  • Migration tool: Flyway. Migrations versioned in repo; reversible where safe; non-reversible (data) clearly flagged.
  • Naming convention: snake_case for tables / columns; consistent ID suffix (_id).
  • loan_event table is append-only; current state derived.
  • Use outbox pattern for publishing to event bus.
  • Versioned buckets for documents.
  • Object Lock (Compliance mode) for evidence packages — 8 – 10 year retention.
  • Lifecycle rules: move to Glacier after 1 year for cold artefacts.
  • Server-side encryption with KMS-managed keys.
  • ElastiCache Redis Multi-AZ.
  • Used for:
    • Decision-engine policy cache.
    • Bureau report cache (TTL-based per policy).
    • Rate limiting counters.
    • Session / short-lived auth tokens.
  • Document and case search.
  • Audit-log indexing.
  • Free-text borrower / case search by ops users.
  • Operational log indexing if not using Loki.
  • MVP: RabbitMQ (managed if available). Simpler, lower ops overhead.
  • Scale: Kafka / MSK for higher throughput, partitioning, ordering.
  • Schema registry: AWS Glue Schema Registry or Confluent.
  • MVP: PostgreSQL read replica + scheduled extracts; sufficient for < 10M rows / table.
  • Scale: Snowflake or ClickHouse — see 5.6 for trade-offs.
  • CDC with Debezium (PostgreSQL → Kafka → warehouse).
  • dbt for SQL transformations Bronze → Silver → Gold.
  • Airflow for any complex pipelines / one-off batches.
  • Metabase at MVP (open source; easy).
  • Superset at scale.
  • Looker / Tableau if enterprise rigour justified.
  • Daily snapshots of all databases.
  • Point-in-time recovery for Postgres (5 – 7 day window).
  • Cross-region replication of backups to Indian DR region.
  • Tested restoration quarterly.
  • Active operational data in OLTP.
  • Historical data (> 3 years) archived to S3 (compressed parquet).
  • Restoration path documented.
  • Static reference tables (NIC codes, IFSC list, GST states) maintained in DB; refreshed on schedule.