Hello Sumit Kumar Yadav,
Thanks for raising this question in Q&A forum.
I understand you need to reconcile bank statement rows (~2k) against a large Bill database (~400k) using fuzzy logic on Vendor Names, while strictly respecting Date (±2 days) and Amount (±$5) constraints, and you need a detailed percentage breakdown of the match score.
While Azure AI Search is excellent for unstructured text, your requirements involve structured arithmetic constraints (Dates/Amounts) combined with fuzzy text. A pure search engine gives a "Relevance Score" (e.g., 2.5), which is difficult to convert into the specific "Amount match %" or "Date match %" you need for your business logic.
Given your stack (Node.js + SQL Server), the most efficient and cost-effective architecture is the "Candidate Filtering" Pattern.
Recommended Approach: SQL (Filtering) + Node.js (Scoring)
You mentioned that fetching 400k rows into Node.js is not feasible. You are correct. Instead, you should let SQL Server do the heavy lifting to reduce the search space first.
Step 1: Candidate Generation (SQL Server)
For each Bank Statement row, execute a targeted SQL query to fetch only "plausible" candidates. SQL is extremely fast at range queries if indexed correctly.
- Input: Statement Date ($D_{stmt}$), Statement Amount ($A_{stmt}$)
- SQL Query Logic:
SELECT Id, VendorName, TotalCost, DueDate FROM BillPay WHERE DueDate BETWEEN DATEADD(day, -2, @StmtDate) AND DATEADD(day, 2, @StmtDate) AND TotalCost BETWEEN (@StmtAmount - 5) AND (@StmtAmount + 5) - Result: Instead of 400,000 rows, this returns perhaps 0 to 10 candidates per statement line.
Step 2: Fuzzy Scoring (Node.js)
Now that you have a small array of candidates in memory (e.g., 5 objects), you can use your existing libraries (fuzzball or Fuse.js) effectively without performance issues.
- Vendor Score: Run
fuzzball.ratio(stmtVendor, candidateVendor). - Date Score: Calculate the difference in days and assign a percentage (e.g., 0 days = 100%, 1 day = 90%).
- Amount Score: Calculate percentage difference.
- Final Score: Weighted Average (e.g.,
(Vendor% * 0.5) + (Amount% * 0.3) + (Date% * 0.2)).
Step 3: Selection
Sort candidates by Final Score and store the top match (or top 3) in your database.
Why not Azure AI Search?
- Overkill: Indexing 400k structured rows just to filter 99% of them out by Date/Amount is unnecessary overhead.
- Scoring Complexity: Azure AI Search uses BM25 (text relevance). Forcing it to rank "Date" and "Amount" proximity strictly requires complex "Scoring Profiles" and still won't give you the pretty "98% Match" breakdown you want to show your operations team.
Summary: Keep the logic in your Node.js/SQL layer. Use SQL for the "Hard" filters (Range) and Node.js for the "Soft" match (Fuzzy Name & Math).
If helps, approve the answer.
Best Regards,
Jerald Felix