Company, Technical, Podcast
17 Dec 2023
Podcast: Building LLMs to understand financial data with Ntropy co-founder and CTO Ilia Zintchenko
Author
Ilia Zintchenko
Co-founder and CTO
This podcast episode features Ilia Zinchenko, founder and CTO of Ntropy. Check out the summary and highlights below.
Highlights
As a CTO, what are the considerations in figuring out what tools you were going to use to build language models?
At Ntropy we have always been a language model company and our stack was always focused on different types and sizes of language models.
Back when we started out in 2018, language models were much smaller than the ones we that are popular today and given their size, are cheaper to train and run.
There is a trade-off across a few key dimensions with language models;
- Accuracy
- Latency
- Reliability
- Cost
With the larger language models and other tricks, we have pushed the boundaries with all of these things.
LLMs are expensive to run and not very tunable. What we are focused on is how do we bring the performance and reasoning of LLMs into a stack which needs to process hundreds of millions and billions of transactions per month at a reasonable cost.
Why is it significant that you and your team have been using language models from very early?
I think the problem Ntropy is solving is not really new. A lot of the initial techniques to do this was kind of based on this merchant lookup tables and these rules-based approaches and also humans.
The approach we're taking to solve this problem has only been possible after the advent of transformer models in 2017/18. A lot of the current players in this market are still using a rules-based approach and merchant lookup tabes. The performance of these approaches, we believe, is capped at a certain point. You're just not going to be able to get the important edge cases with just rules. Language models are really key here.
The value of a bank transaction varies so sometimes you can spend more effort to understand this data more accurately, but in any case, cost matters.
Machine learning models are not cheap so what we're doing here is giving you both at Ntropy, keeping the costs similar to rules-based systems but with the performance of large language models.
What sort of data sources have you been using? And I'm curious if there are some that are better than others. Tell me your perspective there.
To properly understand financial data, you need to know information about the real world. You need to understand what a merchant is, what a location is. You also need to know what dates are and names.
There are many merchant databases out there than can help but none of them are perfect and should not be the sole solution.
We combine a variety of approaches including merchant databases and SERP into our stack. Ultimately we are trying to get information from as many sources of possible to piece together the truth. Currently our merchant database covers 100 million+ unique merchants.
The other part is the financial data itself, like transaction data and invoices and bank statements. This data tells you a lot about the entities involved so internally we are building up a payments graph
What sort of system costs are we talking about here? Is it primarily compute? Is it infrastructure? Is it a mixture of both? Give me a little bit more context there.
The two largest factors there are the cost of cache, memory caching, and the other is the cost if compute, GPU costs. These two costs are also tied together.
For a given transaction or invoice, you spend some compute to process it, but how long do you store it for? If it is an edge case transaction you are unlkely to see again soon, you might decide not to store it because the cost of storing it compared to recomputing it later is more expensive.
How do you optimize for reliability and how do you measure that? What sort of levers are you pulling to ensure that your model is reliable?
It's many factors and we have a dedicated team of people responsible for this. Every time you send like the same input to our pipeline, you would want to expect the same output.
A big part of our stack is not only language models, but also merchant databases and search engines and also these caches where each each entry in the cache has a shelf life and will be deleted after some point. But we still need to make sure that when a customer sends something and they send something again. So this is one part which we have live testing for and we have some fixed sets of transactions we run to verify the output is the same and we flag and investigate if its different.