Case Studies
16 Jul 2024
How this data aggregator unlocked commercial banking customer base with LLM-s
Author
Naré Vardanyan
Co-Founder and CEO
The Need
Open banking in Europe has been around for a while, and there are many companies across financial services and beyond reaping the benefits. These benefits range from faster payments to easy account switching, better personal finance management and finally more accurate and quicker underwriting for small and medium businesses.
Traditionally banks have not been able to operationally justify underwriting SMB-s.
If you are a bank that is deploying capital off of your balance sheet, it makes more sense to spend the same amount of time and resourcesto underwrite a big loan, rather than a tiny one...
Until open banking and account aggregation became available, there was no way of standardizing this process. Small businesses do not have up to date accounting logs or balance sheets, but they have bank data you can base decisions on.
One of Europe’s major open banking providers offering data connectivity to all the top brands, such as Stripe, Revolut and many others, needed help activating and enriching this data to enable their customers in the SMB lending space to make standardized and quick credit decisions at scale, which is the only path to make this type of lending profitable.
Evaluation
The evaluation process was straightforward. They were looking for dynamic labels not constrained to a fixed label hierarchy and a good overview of all the P&L categories to automatically calculate the full P&L from a bank feed.
In this case, every misclassified transaction can make a huge difference. This is what makes business transaction categorization tricky and a previously unsolved problem. It was unsolved before LLM-s became available and you could leverage Ntropy AI to access them at scale.
We benchmarked 2000 transactions using claude-3.5-sonnet + the caching infrastructure.
The numbers were impressive and I have not seen anyone in the field come even close. We classified 94.62% of all transactions right in terms of P&L labels assigned to them, with 95.16% of the merchants being identified and 97.85% of all intermediaries being correctly extracted.
This is considering the fact that context matters a lot within business classification and the majority of the transactions that were not labeled did not contain enough information. Even a human with access to additional tools and common sense would fail to mark these correctly.
The closest anyone has come to these results is Slope, who published their numbers while training a model they called SlopeGPT (https://medium.com/slope-stories/slopegpt-the-first-payments-risk-model-powered-by-gpt-4-cd444ab5242d).
Slope TransFormer is a proprietary LLM fine-tuned to extract meaning from bank transactions. It produces accurate, concise counterparty labels in an interpretable, deterministic way.
The numbers they published were the following:
Accuracy % (exact match) | Jaccard Similarity (fuzzy match) | |
Human (ground truth) | 100% | 100% |
Plaid | 62.0% | 82.8% |
Slope TransFormer | 72.5% | 87% |
Compared to Plaid, which is one of the most popular open banking players out there and a top data aggregator in the US, Slope performance is significantly better.
However, zero shot claude-3.5-sonnet powered by Ntropy infrastructure outperformed both by a significant margin.
ROI
Unlocking SMB underwriting at scale is a major advantage for all FI-s and allows this open banking player to expand its customer base into commercial banking and also step into areas ripe for disruption, such as audit, accounting and more.
This is a zero to one capability that is becoming available thanks to LLM-s
The cost reduction with Ntropy solution vs naked LLM is 155x in this case, with a further 163x reduction in latency , while the accuracy benefits make the case automating business underwriting for SMB-s.
Volume: 30M txs / day
Base model: claude-3.5-sonnet
Uncached cost: 360k / day
With Ntropy cache: 2.3k / day
Cost reduction: 155x
Uncached p50 latency: 18s
Uncached p95 latency: 44s
Cached p50 latency: 110ms
Cached p95 latency: 190ms
Latency reduction: 163x