Gurjeet Singh, co-founder and Executive Chairman of Ayasdi, spoke to bobsguide about the challenges of compliance with anti-money laundering, the characteristics of AI, and how AI is vastly improving false positive rates on suspicious security reports.
How did you find your way to Ayasdi?
I did my graduate work in computational mathematics and computer science and personally I have a deep interest in robotics and machine learning and I’ve been doing that for a number of years. Around 2000, DARPA and NSF, US scientific financing agencies, realised that most of the scientific research financing was going into the creation of data. A lot of new science happened when people analysed that data and discovered something that they might not previously have expected.
One good example of this is the human genome project where we collected the first seven human genomes at great expense. We didn’t exactly know what we were going to find in that data and a lot of amazing science came out of it. DARPA and NSF thought that perhaps the people constructing the best data may not be the best people to analyse it. They noticed that computers keep getting cheaper and started investing in technologies that could heavily augment how scientists discovered information from large complex data sets.
I was part of one such research effort at Stanford Math Department and in 2008, DARPA asked us to commercialise our technology and that’s how I came to co-found Ayasdi.
What are the challenges companies face when dealing with Big Data?
The first challenge is defining Big Data. The term itself refers to the size of the data which is a form of complexity but there are many other forms of complexity as well that arise in datasets pretty frequently. For example, in risk mitigation in financial services the data tends to be tiny, but they have lots and lots of variables, and figuring out which combination of variables to use in a model is very difficult. Nonlinearity is another type of complexity. So the term ‘big data’ masks over lots of different complexities.
And Ayasdi meets that challenge with Topological Data Analysis (TDA).
It’s best to think of TDA as a way of combining the results of various machine learning methods together. When most of the world talks about AI and machine learning today, I often joke that in 2017 it’s never been easier to be in the AI industry than it is today. Some firms develop predictive analytics and call themselves AI enabled, but the domain of AI is so much broader than just predictive.
So with that in mind, there are five key characteristics that every application of AI must exhibit to be called AI.
The first is the aspect of discovery; the ability for AI to find information from large, complex datasets without upfront human intervention. In technical terms, this is called unsupervised or semi-supervised machine learning techniques (such as segmentation, dimensionality reduction, anomaly detection, etc.). As an example, by far the vast majority of data in large enterprises tends to be unlabelled when you don’t know whether it’s a good outcome or a bad outcome. To demonstrate that with AML, a large bank like HSBC will investigate approximately a million transactions and only about 2% of those transactions are ever filed as being suspicious. For the vast majority of data you don’t know if the transactions were good or bad because you haven’t been able to investigate all of them, and of that small set, you’ve only filed a few suspicious activity reports. That’s where unsupervised learning is critical, because you have to make discoveries in the data before you can begin the modelling a process.
The second is that AI must be able to predict. But there’s plenty on that out there so I won’t go into it because it’s probably well understood.
The third is the ability to justify. In the next five to ten years that the vast majority of enterprise systems will be heavily augmented if not outright automated. In our journey to that future, we need machines to justify outcomes to the human operators. So machines need to be able to justify every suggestion and prediction, every segment and anomaly to the human operator. So being able to justify is critical to build trust.
The fourth is the ability to act, so the ability to put these AI systems into practice and make them ‘live’ to carry out the discover, predict and justify function effectively. In a lot of large enterprises, hardly any of the AI application experiments make it to production, because it can’t pass that test in the real world.
The final ability is to learn as the data evolves and as the underlying distributions in the data change. It’s important for the system to be able to monitor it. So, the ability to self-monitor and learn from it and say, this data has changed, I recommend you update your system it in these following ways.
What changes are you beginning to see in your industry?
I expect the regulatory environment to change in particular, and this will open up many opportunities for AI.
I think a lot of focus for banks now is improving risk profiling, and beginning to use AI to meet that problem while at the same time they’re trying to optimise in order to conduct more investigations into the mechanics of suspicious activities.
I think regulators are beginning to realise the implications of what happens when banks can share data amongst themselves, particularly when that’s customer data. I expect to see this change in the regulatory requirement in the next couple of years.
The second concerns what banks do with suspicious activity reports. The information that the banks file to the regulators, is just suspicion. What happens in many cases when banks run out of time to investigate a particular transaction, they file it defensively – where they looked at all the data and couldn’t determine if it was genuine, and so file it suspiciously anyway. Regulators are also beginning to realise that for banks to properly operate on their data, they need feedback, so that’s a change we’re seeing on the horizon. I don’t think it’s lost on the regulators that what they’re asking banks to do is a difficult endeavour.
I’ve heard Ayasdi has been working with HSBC, please explain how.
HSBC is transforming their approach to financial crime risk and that involves looking at different technologies and processes. Their goal is to amplify their investigative teams in such a way that they catch every bad actor they possibly can. Given that the false positive rate across the industry is well north of 90%, a key lever point for increasing the efficiency and efficacy of these operations is to reduce false positives without changing the risk envelope for the bank.
The reason why they have these false alerts is because they have a scenario based approach. To illustrate that scenario based approach, an 80 year-old in Greece making multi-thousand dollar transactions each week is unusual (this is a made-up example). Banks come up with these rules, put them through the transaction monitoring system, and the system flags up whenever the rules are breached. They use this approach partly because the financial regulators ask them to do this. So the regulators will come up with the scenarios that you both deem to be risky, and screen your transactions against those scenarios to see if you are complying.
The problem is that these scenarios are very coarse because the data is immense, and these scenarios essentially ignore almost all of it.
The way HSBC is using our software is to discover segments of customers or pseudo customers, before any transactional monitoring has happened. They use all the data that they have to discover these segments of customers and find-tune the scenarios for each segments. This basically means that they dramatically reduce their false positive rate, because the scenarios are tuned per segment and the segments are discovered in an unsupervised way.
HSBC noticed a reduction of false positives of 20% while they were also able to capture every suspicious activity reported that was filed in the last few years.
How do you see AI evolving in the future?
I think for AI, you have to marry the efficacy of machines with the expertise and context of humans. The upshot is that the jobs that people did in the past are going to change. So I think the view that we’ll develop machines that will simply do human jobs faster, is a false view. Instead, I think AI will open up opportunities to do jobs in completely different ways and that will fundamentally change the way we [as humans] will work.
The AML example shows this very well. So the part of AML that is the investigative part where AI is helping accelerate the same thing that they used to do, but there is also a part especially around risk modelling and segmentation that is completely new and automated.