Graph Databases: The Secret To Effective, Real-Time, Fraud Prevention

By Emil Eifrem | 15 December 2015

Banks and insurance companies lose billions of dollars every year to fraud. Traditional methods of fraud detection play an important role in minimising these losses, but as the criminals develop better ways to evade detection, financial services companies have for a long time been playing a game of catch-up.

Finally, the match between fraudster and bank could become more even, thanks to a new approach to working with the big, complex datasets fraud typically generates: graph databases. Graph-based techniques offer new methods of uncovering cyber scams with a high-level of accuracy, plus may be capable of stopping advanced fraud scenarios in real-time.

No fraud prevention measures can ever be perfect. Nonetheless, significant improvement in beating criminality can be achieved by looking not just at individual data points in a dataset, but to the connections linking them. Typically, these connections go unnoticed until it is too late, a problem as it’s these connections that yield the best clues.

Understanding the connections between data, and deriving meaning from these connections, doesn’t mean gathering new data, incidentally. Very useful insights can be drawn from already existing data, simply by reframing the problem and looking at it in a new way, as a graph.

Unlike most other ways of looking at data, graphs are designed to express and capture relationships. That means they can uncover patterns otherwise difficult to detect using traditional representations, such as SQL tables. An increasing number of companies are using graph databases to solve a variety of connected data problems, including fraud detection: PayPal already uses graph techniques to perform fraud detection on eBay and StubHub transactions in real time, for example. IDC estimates that this has already saved it more than $700 million.

So how can graphs help? Let’s focus on the case of first-party bank fraud, one of the more common types of fraud to affect banking institutions, although often the least well-publicised. In first-party fraud, the only party that gets hurt is the bank, so cases of this form of money siphoning tend not to get the same attention from the media and general public that the money-laundering and identity theft cases do.

Alas, that also makes them easier to pull off. First-party fraud involves fraudsters who gain credit cards, loans, overdrafts and unsecured banking credit lines but who have no intention of ever paying you. It is a serious problem for banking institutions; banks lose tens of millions every year to first-party fraud, and it’s estimated that 10%-20% of unsecured bad debt at leading US and European banks is misclassified as due to other causes, when it’s actually first-party fraud.

Organised crime calls for organised thinking

The surprising magnitude of these losses is the result of two factors. The first is that first-party fraud is difficult to detect in advance. Fraudsters working this scam behave very similarly to legitimate customers until the moment they ‘bust out’, cleaning out all their accounts and promptly disappearing. A second factor is the exponential nature of the relationship between the number of participants in the fraud ring and the overall monetary value controlled by the operation. This connected aspect is a feature often exploited by organised crime.

The good news is that while this characteristic makes these schemes potentially very damaging, it also renders them particularly susceptible to graph-based methods of fraud detection.

While the exact details for each first-party fraud collusion vary, a fraud ring will involve two or more people sharing a subset of legitimate contact information, for example phone numbers and addresses, while combining them to create a number of synthetic identities. With these fake IDs, they will open new accounts, then new accounts are added: unsecured credit lines, credit cards, overdraft protection, personal loans, etc. The accounts are used properly at first, with regular purchases and timely payments, so that the banks increase the credit lines over time, due to this observed credit behavior.

Until bust-out day – when, maxing out all their credit lines, the gang gets as much money out as they can, then disappear. Sometimes, fraudsters will bring all of their balances to zero using fake cheques, doubling the damage. Collections processes ensue, but agents are never able to reach the fraudsters, who are then able to start all over again – and yet more uncollectible debt is written off.

And it can all be done with little real ‘investment’ on the part of the perpetrators. Take two fraudsters, one living at 1 Scam Alley, Scamsville and who owns a prepaid phone, and another living at 1 Swindle Street, Swindle Town, with another such device. Sharing only phone number and address (so two pieces of ID), this ring can combine these to create 22= 4 synthetic identities with fake names with 4-5 accounts for each synthetic identity, a total of 18 accounts. Assuming an average of £4,000 in credit exposure per account, the bank’s loss could be £72,000 – and often a lot more.

Catching culprits

Catching fraud rings and stopping them before they cause damage is a challenge. One reason for the challenge is that traditional methods of fraud detection are not geared to look for the rings created by shared identifiers. Standard techniques, such as a deviation from normal purchasing pattern, rely on discrete data not connections. Discrete methods are useful for catching fraudsters acting alone, but fall short in their ability to detect rings. Furthermore, such methods are prone to false positives, which creates undesired side effects in customer satisfaction and lost revenue opportunity.

This is where graph databases come in. Uncovering rings with traditional relational database technologies requires modeling the data above as a set of tables and columns, then carrying out a series of complex joins and self-joins. Such queries are complex to build and expensive to run, so scaling them in a way that supports real-time access poses significant technical challenges, with performance becoming exponentially worse not only as the size of the ring increases but as the total dataset’s size grows.

Graph databases have emerged as an ideal tool for overcoming such hurdles. In partnership with powerful new data query languages like Cypher, they’re providing a simple semantic for detecting rings in the graph, navigating connections in memory. That means augmenting existing fraud detection infrastructure to support ring detection can be easily done by running appropriate entity-link analysis queries using a graph database, augmented by running checks during key stages in the customer and account lifecycle, such as at account creation, during an investigation, when a credit balance threshold is hit or when a cheque bounces.

Uncovering connections is the key

Real-time graph traversals tied to the right kinds of events can help banks identify probable fraud rings: during or even before the bust-out occurs. As business processes become faster and more automated, the time margins for detecting fraud are becoming narrower, increasing the call for real-time solutions, which graphs can enable.

The second advantage is the value of connected analysis. Criminals have learned to attack systems where they are weak. Traditional technologies, while still suitable and necessary for certain types of prevention, are not designed to detect this class of more elaborate fraud. Graph databases provide a unique ability to uncover a variety of important fraud patterns, either in groups or on an individual basis. That’s because deliberately hidden connections become obvious when looked at with graph queries. Consider their use in highly-impactful fraud scenarios such as the ‘Swiss Leaks’ dataset still uncovering cases months on.

Graph databases are emerging as the ideal solution to finding out such hidden patterns, and at scale. Last year, analyst group Forrester Research predicted that just over a quarter of enterprises will be using these databases by 2017. Meanwhile, graph databases have quietly been powering the Web for some time – with leading consumer and ecommerce sites owing much of their climb to dominance to using graph technology to capture and rapidly exploit online data relationships.

But while early graph database converts like Google and LinkedIn had to build their own in-house graph data stores from scratch, off-the-shelf graph databases are now available to any business wanting to exploit data.

Added to this, there are many types of fraud, and graphs work successfully across them all. Insurance fraud, for instance, known as ‘whiplash for cash’ attracts sophisticated criminal rings which are very good at slipping through most fraud detection nets – and which are all based on connections and rings that graphs are uniquely good at detecting.

To sum up, graph databases are the ideal enabler for efficient fraud detection solutions for any bank or financial institution, as well as any forensic accountancy or regulatory body. That, plus their new-found widespread availability, means there’s no excuse for not using their help any longer.

By Emil Eifrem, co-founder and CEO of Neo Technology.

Become a bobsguide member to access the following

1. Unrestricted access to bobsguide
2. Send a proposal request
3. Insights delivered daily to your inbox
4. Career development