Which service to use for Bank Statement Reconciliation feature with Node.js, SQL Server, and Fuzzy Matching.

Sumit Kumar Yadav 20 Reputation points
2025-11-20T08:20:21.91+00:00

Use Case Summary

We need to match uploaded bank statements (per company, per month, up to ~2,000 rows) against our internal bill database (over 400,000 records). Each statement row must be matched to the best possible bill using:

Date Amount Vendor name (fuzzy)

We return:

Vendor match % 
Amount match % 
Date match % 
Final combined score
BillId

Best match for each statement (and later top 3) Operations team uses this to quickly approve correct matches.
Database Structure

We have two main tables:

BillPay — bill metadata

Id, 
VendorName, 
TotalCost, 
TotalPaid, 
DueDate, 
BillStatus

BillPayment — individual payments

Id,
BillPayId
RecordId, 
AmtPaid, 
ProcFee, 
LateFee, 
OtherFee, 
DatePaid

Important business rules:

If a bill is paid in multiple parts (e.g., $60 + $40), each payment is a separate row in BillPayment.
Both statements ($60 and $40) should match the same BillPay Id.

Some bills have no payment rows yet (BillStatus = ‘Scheduled’ or ‘Auto’). In these cases, we must match using TotalPaid > 0 else TotalCost.

Currently, I tried querying all data from the database by company and month, then using fuzzy string matching packages like Fuse.js and Fuzzball to filter and match vendor names, as well as matching amounts within ±$5 and dates within ±2 days.

Also I looked into Azure AI Search for it, and it gives a score. However, I don't think I can store the data for just two tables. And since I have to look up for each statement, could you please advise me on the best service or approach?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
0 comments No comments
{count} votes

Answer accepted by question author
  1. Jerald Felix 9,840 Reputation points
    2025-11-20T11:41:13.58+00:00

    Hello Sumit Kumar Yadav,

    Thanks for raising this question in Q&A forum.

    I understand you need to reconcile bank statement rows (~2k) against a large Bill database (~400k) using fuzzy logic on Vendor Names, while strictly respecting Date (±2 days) and Amount (±$5) constraints, and you need a detailed percentage breakdown of the match score.

    While Azure AI Search is excellent for unstructured text, your requirements involve structured arithmetic constraints (Dates/Amounts) combined with fuzzy text. A pure search engine gives a "Relevance Score" (e.g., 2.5), which is difficult to convert into the specific "Amount match %" or "Date match %" you need for your business logic.

    Given your stack (Node.js + SQL Server), the most efficient and cost-effective architecture is the "Candidate Filtering" Pattern.

    You mentioned that fetching 400k rows into Node.js is not feasible. You are correct. Instead, you should let SQL Server do the heavy lifting to reduce the search space first.

    Step 1: Candidate Generation (SQL Server)

    For each Bank Statement row, execute a targeted SQL query to fetch only "plausible" candidates. SQL is extremely fast at range queries if indexed correctly.

    • Input: Statement Date ($D_{stmt}$), Statement Amount ($A_{stmt}$)
    • SQL Query Logic:
      
          SELECT Id, VendorName, TotalCost, DueDate 
      
          FROM BillPay 
      
          WHERE DueDate BETWEEN DATEADD(day, -2, @StmtDate) AND DATEADD(day, 2, @StmtDate)
      
            AND TotalCost BETWEEN (@StmtAmount - 5) AND (@StmtAmount + 5)
      
      
    • Result: Instead of 400,000 rows, this returns perhaps 0 to 10 candidates per statement line.

    Step 2: Fuzzy Scoring (Node.js)

    Now that you have a small array of candidates in memory (e.g., 5 objects), you can use your existing libraries (fuzzball or Fuse.js) effectively without performance issues.

    • Vendor Score: Run fuzzball.ratio(stmtVendor, candidateVendor).
    • Date Score: Calculate the difference in days and assign a percentage (e.g., 0 days = 100%, 1 day = 90%).
    • Amount Score: Calculate percentage difference.
    • Final Score: Weighted Average (e.g., (Vendor% * 0.5) + (Amount% * 0.3) + (Date% * 0.2)).

    Step 3: Selection

    Sort candidates by Final Score and store the top match (or top 3) in your database.

    • Overkill: Indexing 400k structured rows just to filter 99% of them out by Date/Amount is unnecessary overhead.
    • Scoring Complexity: Azure AI Search uses BM25 (text relevance). Forcing it to rank "Date" and "Amount" proximity strictly requires complex "Scoring Profiles" and still won't give you the pretty "98% Match" breakdown you want to show your operations team.

    Summary: Keep the logic in your Node.js/SQL layer. Use SQL for the "Hard" filters (Range) and Node.js for the "Soft" match (Fuzzy Name & Math).

    If helps, approve the answer.

    Best Regards,

    Jerald Felix

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.