3.D Data ingestion
Purpose
Section titled “Purpose”Acquire every underwriting input — voluntary, consented, or referential — and normalise it into a clean, structured form that the underwriting engine and analyst queue can consume.
This is the single most operationally critical module for cash-flow underwriting. Data quality at this step compounds through every downstream module.
In-scope features
Section titled “In-scope features”Bank-statement ingestion
Section titled “Bank-statement ingestion”- PDF upload + parse — borrower uploads bank statement PDFs; BSA vendor parses and returns structured data.
- Netbanking fetch — borrower logs into netbanking through a vendor; statement pulled programmatically.
- AA-fetched statement — preferred; structured at source, no parsing errors.
- Multi-month coverage — typically
6 – 24months. - Multi-account merge — for borrowers with several operating accounts.
- Transaction categorisation — vendor-provided + own enrichment.
GST data ingestion
Section titled “GST data ingestion”- GST APIs via GSP — borrower’s consent flow (GSTN-issued OTP) to a GSP vendor; pull filings.
- GSTR-1 (outward supplies) — last
24 – 36months. - GSTR-3B (summary return) — last
24 – 36months. - GSTR-2A / 2B (inward supplies) — last
24 – 36months. - GST profile — turnover, return-filing consistency, late-filing history, suspension flags.
- E-invoice data — IRN-based for invoices above the e-invoice threshold.
- E-way bill data — where available; signals logistics activity.
ITR / income tax data
Section titled “ITR / income tax data”- ITR data — borrower uploads ITR PDFs or pulls via AA-equivalent flow (limited support today).
- Form 26AS / AIS / TIS — borrower-provided.
Accounting / Tally / ERP
Section titled “Accounting / Tally / ERP”- Tally backup — borrower / CA uploads Tally
.tallybackup; vendor or in-house parser extracts ledgers. - Zoho Books export — JSON / Excel export.
- Busy / Marg / Vyapar — vendor-specific exports.
- QuickBooks — for borrowers using QuickBooks Online India.
- Manual P&L / balance sheet upload — fallback.
- Receivable ageing, payable ageing, inventory — extracted from accounting data.
MCA data
Section titled “MCA data”- MCA company / director lookup — auto-pulled at application; refreshed at sanction.
- Financial statements — for companies, last 3 years of audited financials (where filed).
- Annual filings (AOC-4, MGT-7) — for compliance status.
Bureau data (consumer + commercial)
Section titled “Bureau data (consumer + commercial)”- Bureau pull — see 3.E for usage.
- Bureau report parsing — structured fields extracted into platform’s schema.
- Bureau report caching — to avoid duplicate pulls during application lifecycle.
Invoice / PO data
Section titled “Invoice / PO data”- Invoice upload for invoice-backed products.
- GST e-invoice IRN cross-validation — confirms invoice authenticity.
- PO upload for PO-backed products.
- Buyer-side confirmation for SCF / anchor-led — anchor uploads or APIs confirm invoice.
POS / payment-gateway data
Section titled “POS / payment-gateway data”- POS settlement data — for merchant lending (lower priority for the SME wedge).
- UPI / card settlement via payment-gateway APIs.
Marketplace data
Section titled “Marketplace data”- Amazon Marketplace, Flipkart Seller, Meesho, etc. — seller settlement and sales data; relevant for e-commerce seller financing.
Other / referential
Section titled “Other / referential”- Cheque / NACH bounce data — from internal LMS history and (where available) consolidated industry feeds.
- Litigation data — court case checks via vendors (Karza, Probe42).
- GST cancellation / suspension feeds — periodic monitoring of borrower’s GSTIN status.
Out of scope
Section titled “Out of scope”- The actual underwriting logic — see 3.E.
- KYC document fetch — see 3.C.
- AA consent UX — partly here (data pull) and partly in 3.A / 3.B (consent capture).
Key entities
Section titled “Key entities”DataPull— per pull: source, vendor, consent reference, timestamp, status, raw response location, parsed location.BankStatement— borrower, account, period, transactions array, vendor parser version.GstReport— borrower, GSTIN, period, returns, GST summary.TallyExport— borrower, file location, ledger summary, parsed JSON location.BureauReport— borrower, CIC, type (consumer / commercial), pull timestamp, parsed JSON.Invoice— invoice number, IRN, value, due, buyer, status.Po— PO data.ParsingError— per failure.
Key workflows
Section titled “Key workflows”- AA pull pipeline — consent → fetch (multi-FIP) → encrypt-decrypt → parse → store → notify underwriting.
- PDF bank statement — upload → vendor parse → ingest → store; fallback to manual ops if parse fails.
- GST pull — borrower OTP → GSP pull → ingest → derive GST profile.
- Tally upload — borrower / CA uploads → parser → ledger extraction → underwriting features.
- Periodic refresh — for active borrowers (especially WC lines), refresh GST and bank on cadence.
Integrations
Section titled “Integrations”See Section 4 for vendor detail. Key categories:
- BSA: Perfios, FinBox BankConnect, Precisa, ScoreMe, Karza.
- AA TSPs: Setu, FinBox, OneMoney FIU SDK.
- GSPs: Cygnet, Karix, Webtel, MasterIndia, Taxgenie, Vayana.
- MCA / commercial bureau: Karza, Probe42, Tofler, Signzy.
- Tally / accounting parsers: Karza, Perfios, vendor-specific.
- Bureau APIs: CIBIL, Experian, Equifax, CRIF (direct or via aggregator).
- Invoice / e-invoice: GST IRP via GSP.
POST /data/aa/consent-request— initiate AA consent.POST /data/aa/fetch— trigger fetch given a valid consent.POST /data/bank-statement/upload— upload PDF for parsing.POST /data/gst/pull— pull GST returns given borrower consent.POST /data/tally/upload— upload Tally backup.POST /data/mca/fetch— fetch MCA data.POST /data/bureau/pull— pull bureau report.POST /data/invoices/bulk-upload.GET /data/pulls/{id}— status.GET /borrowers/{id}/data-room— aggregated view of all data pulled.
Events emitted
Section titled “Events emitted”data.aa.consent_granted/data.aa.fetch_succeeded/data.aa.fetch_faileddata.bank_statement.parsed/data.bank_statement.parse_faileddata.gst.pulleddata.tally.parseddata.bureau.pulleddata.refresh.due(periodic)data.quality.alert(anomaly detection)
Edge cases
Section titled “Edge cases”- Bank statement PDF is image-only (scanned) — OCR fallback; quality varies wildly; some vendors handle, some don’t.
- Tampered bank statement — many vendors detect tampering signals (text-layer inconsistencies, fonts, totals); workflow handles flag.
- AA returns partial data — borrower selected only some accounts; reconcile with claimed accounts; nudge for missing.
- GST return not filed for recent period — risk signal; halt fast-track flow.
- Tally export password-protected — borrower must supply password or unprotect.
- MCA company status “Struck off” — block.
- Bureau “No record” — could be valid (new business) or fraud (PAN mismatch); investigate.
- Invoice IRN not found on GST IRP — invoice may be old / below e-invoice threshold / fraudulent.
- Bank statement spans multiple PDFs with overlap — dedupe transactions on hash + ref + amount + date.
- Multi-account merge with inter-account transfers — circular transactions detected and netted; double counting avoided.
Compliance touchpoints
Section titled “Compliance touchpoints”- RBI Digital Lending Guidelines — borrower data minimisation; consent for each data source; no data retention beyond purpose.
- AA Master Direction — consent artefact mandatory for AA data.
- DPDP — consent for every collection; purpose limitation; retention schedule.
- KYC MD — KYC-relevant data subject to KYC MD; CKYC upload of identity portion.
- Outsourcing MD — every vendor governed.
MVP vs production
Section titled “MVP vs production”| Feature | MVP | Production |
|---|---|---|
| AA bank-statement pull | ✓ (1–2 AAs) | Multi-AA |
| PDF bank-statement parse | ✓ | ✓ |
| Netbanking fetch | (Optional MVP) | ✓ |
| GST pull | ✓ | ✓ |
| Tally upload + parse | ✓ | ✓ |
| Zoho / Busy / Marg parse | (Phase 2) | ✓ |
| MCA fetch | ✓ | ✓ |
| Bureau pull | ✓ | ✓ |
| Invoice upload + IRN check | (Phase 2) | ✓ |
| Marketplace data | (Phase 3) | ✓ |
| Periodic refresh scheduler | ✓ | ✓ |
| Tampering detection | ✓ (vendor-provided) | ✓ (multi-vendor + own checks) |
Related: 3.E Underwriting engine, 4. Integrations, 2.8 AA rules, 6. Underwriting.