5.4 Events and async
Why events
Section titled “Why events”Lending is intrinsically event-driven: sanction triggers documentation; documentation triggers disbursement; disbursement triggers LMS activation; every repayment triggers ledger updates, classification, and analytics. Modelling these as synchronous chains is fragile (one slow downstream blocks everything) and unauditable.
Events give:
- Loose coupling between modules.
- Audit trail by default — events are immutable.
- Replay capability for reconciliation and recovery.
- Easy fan-out to new consumers (analytics, partner systems, audit, ML feature store).
Event bus choice
Section titled “Event bus choice”| Option | When to choose | Trade-off |
|---|---|---|
| RabbitMQ | MVP, lower throughput, simpler ops | Less ordering / partitioning power |
| Kafka (MSK) | Scale, partitioning, ordering, large fan-out | Operational overhead, more complex client semantics |
| SQS + SNS | AWS-native simple at very low volume | Limited features for complex routing |
For this platform: RabbitMQ at MVP (lower ops; sufficient throughput for < 1000 events/sec). Kafka / MSK at scale when ordering per loan and very high fan-out matter.
Event taxonomy
Section titled “Event taxonomy”Events are namespaced by module. Sample of the core set:
application.created application.submittedkyc.individual.completed kyc.individual.failedkyb.entity.completeddata.aa.consent_granted data.aa.fetch_succeeded data.aa.fetch_faileddata.bank_statement.parsed data.gst.pulled data.bureau.pulleddecision.run.started decision.run.completeddecision.approved decision.declined decision.referreddeviation.requested deviation.approved deviation.declinedsanction.issueddocument.generated kfs.acknowledgedesign.completed esign.failedmandate.active mandate.faileddisbursement.created disbursement.approved disbursement.executeddisbursement.failed utr.capturedloan.activatedaccrual.postedrepayment.received repayment.allocatedbounce.receivedclassification.changedprovision.computedrestructure.appliedwriteoff.appliedloan.closedlender_share.posted settlement.completeddlg.invokedcollection.case.opened ews.raisedaudit.eventSchema management
Section titled “Schema management”Every event has a schema in a schema registry (Confluent Schema Registry, AWS Glue Schema Registry, or a homegrown JSON-Schema store).
Rules:
- Backward compatibility mandatory — consumers tolerate optional new fields.
- No breaking changes without versioning the topic (
loan.activated.v2). - Schema review in CI — every new event or change requires registry update + compatibility check.
- Consumer SDK generated from schema so consumers compile-fail on incompatible reads.
Idempotency
Section titled “Idempotency”Every event-producing API call carries an idempotency key (UUID); every consumer dedupes by key.
Pattern:
Producer: - Include `idempotency_key` in event metadata (same as request's idempotency key).
Consumer: - On receive: check `seen_idempotency_keys` table for this consumer+topic. - If present: skip (it's a duplicate replay). - Else: insert key into seen table in same transaction as business processing. - Commit. - Ack to broker.This makes consumers safe under at-least-once delivery, which is the only reliable delivery semantic across broker failures.
Retries and DLQ
Section titled “Retries and DLQ”- Transient failures (network, vendor 5xx, DB deadlock) → retry with exponential backoff (
100ms,200ms,400ms, …, capped at30s, max10 attempts). - Permanent failures (schema mismatch, business-rule violation in consumer) → dead-letter queue (DLQ) with full message + error context.
- DLQ inspection UI for ops to triage, fix root cause, and re-process.
- Alerts on DLQ size growth (PagerDuty / Slack).
Ordering and partitioning
Section titled “Ordering and partitioning”For events that need ordering (e.g., all events for the same loan):
- Partition key =
loan_id(Kafka); ensures all events for the same loan land on the same partition and consumer. - Within a partition, Kafka guarantees order.
- Across partitions, no order guarantee — design consumers to be order-tolerant per partition only.
For events without ordering requirements (e.g., notifications), any partition strategy works.
Event sourcing for accounting flows
Section titled “Event sourcing for accounting flows”Loan-affecting writes (disbursement, accrual, repayment, charge, classification, write-off) are append-only events on the loan_event table (and emitted on the bus).
Current state is derived. The loan_account table’s current_outstanding, current_classification, etc. are projections from the event stream.
Benefits
Section titled “Benefits”- Auditability — every state change explicit and immutable.
- Time travel — query the loan’s state at any past timestamp.
- Reconciliation — replay against an alternate projection to verify.
- Debugging — see exactly what happened in sequence.
- Discipline required — every state-changing write must go through the event path.
- Storage growth — events accumulate; old events partition / archive policy needed.
- Projection complexity — projections must be carefully built and tested.
Recommendation
Section titled “Recommendation”Event source the LMS module from day 1. Other modules can use simpler state-machine patterns; event-source only where the audit and reconciliation benefits justify the discipline cost.
Common patterns
Section titled “Common patterns”Outbox
Section titled “Outbox”Producers write event records to an outbox table in the same database transaction as the business change. A separate outbox publisher process reads the outbox and publishes to the bus.
This avoids the dual-write inconsistency where a business change commits but the event publication fails.
Business transaction: INSERT INTO loan_event (...); INSERT INTO outbox (event_type, payload, status='pending');COMMIT;
Outbox publisher (async, separate process): SELECT * FROM outbox WHERE status='pending' LIMIT 100; for each row: publish to bus; UPDATE outbox SET status='published' WHERE id=...;Most events in the platform use this pattern.
For multi-step workflows that span modules (sanction → docs → disbursement → LMS activation), use sagas orchestrated by the workflow engine (5.5). Sagas handle compensating actions on failure (e.g., revoke docs, cancel mandate if disbursement fails).
CQRS (selectively)
Section titled “CQRS (selectively)”For high-read entities (borrower view, loan view), maintain a read model distinct from the write model. The read model is updated by consuming events. This lets reads scale independently and avoids contention on the write model.
Use CQRS selectively — not every entity needs it. LMS read views (statements, dashboards) are good candidates.
Observability for the event bus
Section titled “Observability for the event bus”- Event lag monitoring — consumer lag per topic.
- Throughput per topic and per consumer.
- DLQ size alerts.
- Schema-incompatibility alerts in CI.
- Per-event tracing via distributed-tracing IDs propagated through event metadata.
Standard dashboards:
- Topic-level: rate, lag, error rate.
- Consumer-level: throughput, lag, retry rate, DLQ rate.
- Event-type-level: count, age distribution, top consumers.
Schema example
Section titled “Schema example”A representative event schema (loan.activated):
{ "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "title": "loan.activated v1", "required": ["event_id","occurred_at","loan_id","sanction_id","borrower_id"], "properties": { "event_id": { "type": "string", "format": "uuid" }, "occurred_at": { "type": "string", "format": "date-time" }, "loan_id": { "type": "integer" }, "sanction_id": { "type": "integer" }, "borrower_id": { "type": "integer" }, "product_id": { "type": "integer" }, "initial_principal": { "type": "number" }, "co_lender_share": { "type": "object", "properties": { "partner_lender_id": { "type": "integer" }, "originator_pct": { "type": "number" }, "partner_pct": { "type": "number" } } }, "metadata": { "type": "object", "properties": { "trace_id": { "type": "string" }, "idempotency_key": { "type": "string" } } } }}What event-bus is and isn’t
Section titled “What event-bus is and isn’t”- It is the channel for loose-coupled state propagation between modules and for downstream fan-out.
- It is not the transactional persistence layer — that remains PostgreSQL with ACID transactions.
- It is not a substitute for synchronous APIs — those exist for read paths and command paths where the caller needs immediate confirmation.
- It is not unbounded — every event-bus topic, every consumer, every retry policy is deliberate.
Related
Section titled “Related”- 5.5 Workflow and rules engines — sagas orchestrated above events.
- 5.6 Data platform and warehouse — CDC + event stream feed the warehouse.
- 5.7 Security, IAM, audit —
audit.eventintegrity. - 5.8 Sequence diagrams — see events flowing through the diagrams.
- 5.11 Worked data example —
loan_eventrows for one loan.