13.5 Data ingestion
Epic overview
Section titled “Epic overview”The ingestion module is the single biggest determinant of underwriting quality. Every credit decision is only as good as the data it sees. Get this right and every downstream module benefits; get it wrong and credit costs are permanently inflated by bad signal.
Build orchestrated, consent-driven, multi-vendor, multi-source ingestion that handles AA, PDF upload, GST pulls, Tally exports, MCA fetches, bureau pulls, and periodic refresh — all surfaced in a single “data room” view per borrower.
User stories
Section titled “User stories”- As a borrower, I want to authorise my bank data via AA in one click rather than uploading PDFs.
- As a borrower whose bank doesn’t have AA coverage, I want a fallback to PDF upload that still produces clean structured data.
- As a borrower, I want to enter my GSTN OTP once and have the platform pull
24 monthsof returns automatically. - As a CA, I want to upload my SME client’s Tally backup file and have the platform extract P&L, balance sheet, ageing, inventory automatically.
- As an analyst, I want a single “data room” view per borrower with every data source’s status and key metrics surfaced at a glance.
- As an underwriting engine, I want every data point I read to come with its source, vendor, timestamp, and parsing version recorded.
- As a system, I want to refresh AA bank statements monthly for active revolving-line borrowers so my early-warning signals stay fresh.
- As a system, I want to detect bank-statement tampering before it reaches underwriting and route the application to fraud queue.
- As an analyst, I want the data room to flag reconciliation issues (GST sales vs bank credits, Tally vs GST, etc.) automatically.
- As an analyst, I want to see invoice IRN cross-verification results when borrower uploads invoices for an invoice-backed product.
- As a CTO, I want to swap BSA vendor from Perfios to FinBox via routing config without changing application code.
- As a compliance officer, I want every data fetch to reference a valid consent and respect the consent’s expiry / revocation.
- As an operator, I want a dashboard of in-flight pulls, success rates per vendor, and average latencies.
- As a borrower whose GST returns aren’t filed for the latest period, I want a clear message asking me to file before proceeding.
- As an underwriting analyst, I want manufacturer purchase data verified via GSTR-2A for distributor borrowers as part of the data room.
API requirements
Section titled “API requirements”Account Aggregator
Section titled “Account Aggregator”POST /v1/data/aa/consent-request— initiate consent. Body:{ borrower_id, purpose, fi_types, date_range, frequency, validity }. Returns{ consent_handle, redirect_url }.GET /v1/data/aa/consents/{id}/status— check consent status.POST /v1/data/aa/fetch— trigger fetch. Body:{ consent_handle }. Returns{ fetch_id, status }.POST /v1/data/aa/consents/{id}/revoke— handle revocation.POST /v1/data/aa/webhook— receive AA webhooks (consent updates, revocations).
Bank statement (PDF / netbanking)
Section titled “Bank statement (PDF / netbanking)”POST /v1/data/bank-statement/upload— upload PDF. Body: multipart with file +{ account_ref?, password? }. Returns{ pull_id, status }.POST /v1/data/bank-statement/netbanking-fetch— initiate netbanking fetch. Body:{ bank, credentials_session }. Returns redirect.GET /v1/data/bank-statement/{id}— retrieve parsed result.
POST /v1/data/gst/initiate— initiate GSP-based pull. Body:{ borrower_id, gstin, periods, gsp }. Returns{ otp_required, session_id }.POST /v1/data/gst/otp-submit— submit borrower’s OTP.POST /v1/data/gst/fetch— fetch GSTR-1 / 3B / 2A.GET /v1/data/gst/{id}— retrieve parsed result.
Tally / accounting
Section titled “Tally / accounting”POST /v1/data/tally/upload— upload Tally backup. Body: multipart +{ password?, company_name? }.POST /v1/data/zoho/connect— OAuth flow to Zoho Books.POST /v1/data/zoho/fetch— fetch books data.POST /v1/data/manual/upload— manual P&L / BS upload fallback.
POST /v1/data/mca/fetch— fetch by CIN / LLPIN.GET /v1/data/mca/{id}— retrieve.POST /v1/data/mca/refresh— re-pull (typically at sanction stage).
Bureau
Section titled “Bureau”POST /v1/data/bureau/pull— pull bureau report. Body:{ subject_type, subject_id, cic, report_type, soft_pull? }.GET /v1/data/bureau/{id}— retrieve report.POST /v1/data/bureau/monthly-submission/{cic}/{period}— trigger monthly submission file generation.
Invoice
Section titled “Invoice”POST /v1/data/invoices/bulk-upload— bulk invoice upload.POST /v1/data/invoices/{id}/irn-verify— verify IRN against GST IRP.GET /v1/data/invoices/{id}— retrieve.
Marketplace
Section titled “Marketplace”POST /v1/data/marketplace/{platform}/connect— OAuth / API connect for marketplace seller.POST /v1/data/marketplace/{platform}/fetch— fetch settlement reports.
Aggregated view
Section titled “Aggregated view”GET /v1/borrowers/{id}/data-room— aggregated catalogue of all pulls + key metrics + reconciliation flags.GET /v1/borrowers/{id}/data-room/refresh-status— what’s stale, what’s fresh.
Periodic refresh
Section titled “Periodic refresh”GET /v1/data/refresh-schedule— list of due refreshes.POST /v1/data/refresh-schedule/{id}/trigger— manual trigger.- Cron job: scheduled refresh for active borrowers.
Data model
Section titled “Data model”data_pull— every pull request.bank_statement,gst_report,tally_export,mca_snapshot,bureau_report,invoice,marketplace_settlement.data_pull_error— captures parse / API failures for ops triage.data_reconciliation_flag— derived flag when sources don’t match.
See 5.3 Core data model for full schema.
UI screens
Section titled “UI screens”- Borrower data-collection screens: AA consent (redirect to AA UI), PDF upload, GST OTP entry, Tally upload guidance.
- Status tracker showing progress of each data pull.
- Analyst data-room view: aggregated, with per-source status, metrics, reconciliation flags, drill-down to raw data.
- Admin vendor routing config: per data type, primary + fallback vendor, rate limits.
- Admin pull-monitoring dashboard: real-time per-vendor success rate, latency, error categories.
Backend services
Section titled “Backend services”- Ingestion orchestration service — receives consent, routes to vendor, handles async / sync.
- Per-source adapters — AA, BSA, GST, MCA, Tally, Zoho, bureau, invoice, marketplace.
- Normalisation service — converts vendor-specific responses to internal canonical schema.
- Reconciliation service — runs cross-source checks; flags discrepancies.
- Periodic refresh scheduler — cron-driven for active borrowers.
- Tampering detector — orchestrates vendor flags and own checks.
External integrations
Section titled “External integrations”- AA TSPs: Setu, FinBox (primary), OneMoney FIU SDK.
- BSA: Perfios (primary), FinBox BankConnect, Precisa, ScoreMe.
- GSP: Cygnet, Karix, Webtel, MasterIndia, Taxgenie, Vayana.
- Tally / accounting: Karza (Perfios) Tally connector; Zoho REST API; vendor-specific for Busy / Marg / Vyapar.
- MCA: Karza, Probe42, Tofler, Signzy.
- Bureau: All 4 CICs via direct or aggregator (Karza / Perfios).
- GST IRP for IRN verification (via GSP).
- Marketplace: Amazon SP-API, Flipkart Seller API, Meesho where API exists.
Test cases
Section titled “Test cases”Happy path
Section titled “Happy path”- AA fetch successful for HDFC / ICICI / Axis primary banks.
- PDF parse succeeds for common bank statement formats.
- GST returns parsed; metrics extracted (turnover, filing pattern, top buyers).
- Tally backup parsed; P&L + BS + ageing extracted.
- Bureau pull cached for
30 days; no duplicate enquiries. - Periodic refresh runs on schedule.
- Data-room aggregated view loads in
< 3 seconds.
Failure / edge
Section titled “Failure / edge”- AA returns partial data (borrower selected only some accounts) → flag for re-consent.
- PDF tampered → BSA flag → tampering queue.
- PDF is image-only (scanned) → OCR fallback; quality check.
- Tally export password-protected → borrower upload password prompt.
- GST return not filed for recent period → soft block + nudge borrower.
- Bureau no-hit on PAN → check name variations; refer for manual.
- MCA shows “Strike-off” → DECLINE immediately.
- IRN verification fails for above-threshold invoice → flag.
- Vendor API timeout → retry → after
Nretries, failover to secondary vendor. - Consent expired mid-fetch → re-consent flow.
- Borrower’s GST OTP fails → retry → switch GSP.
Reconciliation
Section titled “Reconciliation”- GST sales
₹4 Cr, bank credits₹3 Cr→ flag divergence25%. - Tally revenue
₹4.5 Cr, GST sales₹4 Cr→ flag divergence12.5%. - Cash deposits
40%of credits → REFER. - Inter-account transfers detected → de-duplicate in credit-turnover calculation.
Edge cases
Section titled “Edge cases”- Bank statement PDF spans multiple files with overlap → dedupe transactions on
hash + ref + amount + date. - Multi-account merge for borrower with
4operating accounts → reconcile across. - Tally backup with multiple companies → pick the relevant entity, ignore others.
- Borrower with multiple GSTINs (multi-state) → pull each separately, consolidate.
- Bureau report missing some fields → graceful degradation; flag in data room.
- Marketplace settlement period overlaps with bank settlement → reconcile to avoid double-count.
- Borrower’s primary bank changes mid-loan → trigger fresh AA for the new account.
- AA TSP outage → automatic failover to secondary TSP.
- Vendor rate-limit exceeded → queue + retry with backoff; admin alert if persistent.
- BSA vendor parser version change → reprocess affected applications; verify metrics didn’t shift.
Acceptance criteria
Section titled “Acceptance criteria”- AA fetch success rate
> 90%for major banks. - PDF parse success rate
> 90%. - GST pull success rate
> 95%(once OTP entered). - Data room loads aggregated view in
< 3 seconds. - Periodic refresh runs on schedule with
> 95%success. - Tampering detection live with at least one vendor.
- Reconciliation flags computed for every borrower with both GST and bank data.
- Every data point in data room is traceable to source pull (vendor, timestamp, version).
- Vendor failover tested with chaos drills.
Compliance touchpoints
Section titled “Compliance touchpoints”- RBI AA Master Direction — every fetch must reference a valid consent artefact.
- Digital Lending Guidelines — data minimisation; consent for each purpose.
- DPDP Act 2023 — purpose limitation; retention schedule; deletion on revocation.
- CICRA — bureau pull + monthly submission.
- Outsourcing MD — every vendor governed.
Related
Section titled “Related”- 3.D Data ingestion module.
- 2.8 AA rules.
- 4.3 Account Aggregator, 4.4 BSA, 4.5 GST, 4.6 Accounting.
- 6. Underwriting — what the data feeds into.