Skip to content

3.D Data ingestion

Acquire every underwriting input — voluntary, consented, or referential — and normalise it into a clean, structured form that the underwriting engine and analyst queue can consume.

This is the single most operationally critical module for cash-flow underwriting. Data quality at this step compounds through every downstream module.

  • PDF upload + parse — borrower uploads bank statement PDFs; BSA vendor parses and returns structured data.
  • Netbanking fetch — borrower logs into netbanking through a vendor; statement pulled programmatically.
  • AA-fetched statement — preferred; structured at source, no parsing errors.
  • Multi-month coverage — typically 6 – 24 months.
  • Multi-account merge — for borrowers with several operating accounts.
  • Transaction categorisation — vendor-provided + own enrichment.
  • GST APIs via GSP — borrower’s consent flow (GSTN-issued OTP) to a GSP vendor; pull filings.
  • GSTR-1 (outward supplies) — last 24 – 36 months.
  • GSTR-3B (summary return) — last 24 – 36 months.
  • GSTR-2A / 2B (inward supplies) — last 24 – 36 months.
  • GST profile — turnover, return-filing consistency, late-filing history, suspension flags.
  • E-invoice data — IRN-based for invoices above the e-invoice threshold.
  • E-way bill data — where available; signals logistics activity.
  • ITR data — borrower uploads ITR PDFs or pulls via AA-equivalent flow (limited support today).
  • Form 26AS / AIS / TIS — borrower-provided.
  • Tally backup — borrower / CA uploads Tally .tally backup; vendor or in-house parser extracts ledgers.
  • Zoho Books export — JSON / Excel export.
  • Busy / Marg / Vyapar — vendor-specific exports.
  • QuickBooks — for borrowers using QuickBooks Online India.
  • Manual P&L / balance sheet upload — fallback.
  • Receivable ageing, payable ageing, inventory — extracted from accounting data.
  • MCA company / director lookup — auto-pulled at application; refreshed at sanction.
  • Financial statements — for companies, last 3 years of audited financials (where filed).
  • Annual filings (AOC-4, MGT-7) — for compliance status.
  • Bureau pull — see 3.E for usage.
  • Bureau report parsing — structured fields extracted into platform’s schema.
  • Bureau report caching — to avoid duplicate pulls during application lifecycle.
  • Invoice upload for invoice-backed products.
  • GST e-invoice IRN cross-validation — confirms invoice authenticity.
  • PO upload for PO-backed products.
  • Buyer-side confirmation for SCF / anchor-led — anchor uploads or APIs confirm invoice.
  • POS settlement data — for merchant lending (lower priority for the SME wedge).
  • UPI / card settlement via payment-gateway APIs.
  • Amazon Marketplace, Flipkart Seller, Meesho, etc. — seller settlement and sales data; relevant for e-commerce seller financing.
  • Cheque / NACH bounce data — from internal LMS history and (where available) consolidated industry feeds.
  • Litigation data — court case checks via vendors (Karza, Probe42).
  • GST cancellation / suspension feeds — periodic monitoring of borrower’s GSTIN status.
  • The actual underwriting logic — see 3.E.
  • KYC document fetch — see 3.C.
  • AA consent UX — partly here (data pull) and partly in 3.A / 3.B (consent capture).
  • DataPull — per pull: source, vendor, consent reference, timestamp, status, raw response location, parsed location.
  • BankStatement — borrower, account, period, transactions array, vendor parser version.
  • GstReport — borrower, GSTIN, period, returns, GST summary.
  • TallyExport — borrower, file location, ledger summary, parsed JSON location.
  • BureauReport — borrower, CIC, type (consumer / commercial), pull timestamp, parsed JSON.
  • Invoice — invoice number, IRN, value, due, buyer, status.
  • Po — PO data.
  • ParsingError — per failure.
  1. AA pull pipeline — consent → fetch (multi-FIP) → encrypt-decrypt → parse → store → notify underwriting.
  2. PDF bank statement — upload → vendor parse → ingest → store; fallback to manual ops if parse fails.
  3. GST pull — borrower OTP → GSP pull → ingest → derive GST profile.
  4. Tally upload — borrower / CA uploads → parser → ledger extraction → underwriting features.
  5. Periodic refresh — for active borrowers (especially WC lines), refresh GST and bank on cadence.

See Section 4 for vendor detail. Key categories:

  • BSA: Perfios, FinBox BankConnect, Precisa, ScoreMe, Karza.
  • AA TSPs: Setu, FinBox, OneMoney FIU SDK.
  • GSPs: Cygnet, Karix, Webtel, MasterIndia, Taxgenie, Vayana.
  • MCA / commercial bureau: Karza, Probe42, Tofler, Signzy.
  • Tally / accounting parsers: Karza, Perfios, vendor-specific.
  • Bureau APIs: CIBIL, Experian, Equifax, CRIF (direct or via aggregator).
  • Invoice / e-invoice: GST IRP via GSP.
  • POST /data/aa/consent-request — initiate AA consent.
  • POST /data/aa/fetch — trigger fetch given a valid consent.
  • POST /data/bank-statement/upload — upload PDF for parsing.
  • POST /data/gst/pull — pull GST returns given borrower consent.
  • POST /data/tally/upload — upload Tally backup.
  • POST /data/mca/fetch — fetch MCA data.
  • POST /data/bureau/pull — pull bureau report.
  • POST /data/invoices/bulk-upload.
  • GET /data/pulls/{id} — status.
  • GET /borrowers/{id}/data-room — aggregated view of all data pulled.
  • data.aa.consent_granted / data.aa.fetch_succeeded / data.aa.fetch_failed
  • data.bank_statement.parsed / data.bank_statement.parse_failed
  • data.gst.pulled
  • data.tally.parsed
  • data.bureau.pulled
  • data.refresh.due (periodic)
  • data.quality.alert (anomaly detection)
  • Bank statement PDF is image-only (scanned) — OCR fallback; quality varies wildly; some vendors handle, some don’t.
  • Tampered bank statement — many vendors detect tampering signals (text-layer inconsistencies, fonts, totals); workflow handles flag.
  • AA returns partial data — borrower selected only some accounts; reconcile with claimed accounts; nudge for missing.
  • GST return not filed for recent period — risk signal; halt fast-track flow.
  • Tally export password-protected — borrower must supply password or unprotect.
  • MCA company status “Struck off” — block.
  • Bureau “No record” — could be valid (new business) or fraud (PAN mismatch); investigate.
  • Invoice IRN not found on GST IRP — invoice may be old / below e-invoice threshold / fraudulent.
  • Bank statement spans multiple PDFs with overlap — dedupe transactions on hash + ref + amount + date.
  • Multi-account merge with inter-account transfers — circular transactions detected and netted; double counting avoided.
  • RBI Digital Lending Guidelines — borrower data minimisation; consent for each data source; no data retention beyond purpose.
  • AA Master Direction — consent artefact mandatory for AA data.
  • DPDP — consent for every collection; purpose limitation; retention schedule.
  • KYC MD — KYC-relevant data subject to KYC MD; CKYC upload of identity portion.
  • Outsourcing MD — every vendor governed.
FeatureMVPProduction
AA bank-statement pull✓ (1–2 AAs)Multi-AA
PDF bank-statement parse
Netbanking fetch(Optional MVP)
GST pull
Tally upload + parse
Zoho / Busy / Marg parse(Phase 2)
MCA fetch
Bureau pull
Invoice upload + IRN check(Phase 2)
Marketplace data(Phase 3)
Periodic refresh scheduler
Tampering detection✓ (vendor-provided)✓ (multi-vendor + own checks)

Related: 3.E Underwriting engine, 4. Integrations, 2.8 AA rules, 6. Underwriting.