Big data is here to stay, although it means various things to different people, depending upon who you ask, says bobsguide blogger (aka contributing editor) Andrew Witney, Barclays’ CIO in this second instalment of his blog. In its simplest form ‘big data’ means the mass of metadata, structured and unstructured data that is now out there thanks to social media, mobile channels, internal systems and devices connected to the ‘internet of things’; all of which can be mined to gauge customer sentiment, for valuable data, and other information via effective analytical tools. These big data tools are often aligned with the complimentary ‘big computing’ trend, which simply makes computer power cheaper and faster, enabling more accurate analysis.
As technology advances the amount of data stored about us as individuals continues to grow. With the ‘internet of things’, where devices of all types are connected to the net, finally now becoming a reality there is data available on virtually everything about us - covering what we do, where, and how we behave.
Big data offers great potential to provide major steps forward in society and indeed for banks and businesses, but it also comes with a large red flag concerning privacy and intrusion, as Edward Snowden’s revelations about the use of metadata by security services shows. The potential for abuse of this data is significant, as we have all seen in the press, but get it right and ‘big data’ techniques and analytical tools can help people get better service and assist financial services (FS) firms to target resources more effectively. It’s a fine line between being helpful and intrusive, and one the FS industry will have to walk with great care, but the topic of big data cannot be ignored.
Defining Big Data
As with cloud computing, the definition of ‘big data’ is not consistent across the industry. It means many things to different people. The challenge for many organisations is to agree on what a firm means when it refers to it before green-lighting any big data initiatives. This needs to be done before any proposals or funding is approved and a specific usage should be identified before pushing ahead. A best practice approach is to look for the business driver and assess it against the tools available for a specific project - this should be done regardless of the project, but is particularly important in regard to big data.
The most common definition of big data, and one supported by Gartner, looks at it against four axes – comprising of volume, velocity, variety and complexity. Worth noting though is that the complexity axis has been dropped in favour of the current definition which states that “big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”.
The key factor is to move beyond the structured data that exist in traditional data warehouses and combine that with unstructured and semi-structured data to gain an improved overview.
Banks have been executing on at least two of these axes for many years via traditional Online Transfer Processing (OLTP) and Data Warehousing IT systems. However, the cost associated with these systems precludes the adoption of high-volume as the thresholds of what is now accepted as high-volume continues to increase (and as per the previous comment, traditional systems are not typically suited to supporting unstructured data at any large scale).
The Uses of Big Data
Big data technologies based on commodity hardware and open-source software are a cost effective means of continuing to store and process high-volume structured data that would otherwise be archived or destroyed. Often this data is held alongside the semi-structured data in what is commonly referred to as a ‘data lake’ - where big data technologies are adopted as part of the data integration landscape. This will allow more data to be stored or staged, and provide the ability to perform data discovery at the point of collection and dissemination.
One of the key challenges for banks is adopting the variety of data that is now available in semi-structured and non-relational forms. Vast amounts of data are generated every day outside the scope of the traditional structured data systems in web, social media, machine logs, and sensor data (yes your credit card probably contains an radio frequency identification (RFID) chip. Understanding what is noise, as big data guru Nate Silver might term it, and what is insight is what matters. Then linking that to what you already know about the customer/product/transaction data is the key to unlocking the true value of big data for banks and other FS firms.
Big data is often thought of as batch-based because the technology should only be accessed by a few specialists - largely due to its roots in inexpensive file-based systems which lacked the tools and techniques for optimisation, workload management and query concurrency; not to mention the main concern for banks regarding the lack of end-user security. However, much has changed recently as the open-source community has started to address these challenges and it is inevitable that big data technologies will, and are fast-becoming, mainstream.
Mainstream trend: Rise of Big Data, FS Uses & Gauging Customer Sentiment
What does mainstream mean for a bank? Does that mean it no longer needs expensive relational database and massively parallel database technologies because of the rise of so-called big data? Probably not - the two will evolve to complement each other in my opinion, based upon the workload it needs to support, and the acceptable cost of managing that workload.
Given that we have now moved to a world when we can bring all this structured and unstructured data from various sources together, using both internal and external sources such as social media and blogs, the question is ‘what do we do with it all’? Giving insight to customer behaviours and customer sentiments is the real driver here and is very relevant in retail banking where understanding customer behaviour on an individual basis is key to providing better products and service to customers. However, a word of caution is needed. Although this data is in the public domain we all read in the press about the concerns around privacy, so the real question is how far can we go before we are becoming intrusive to people lives? Getting the demarcation lines right matters.
The wholesale banking world is less obvious in terms of potential conflicts and uses, but big data issues still exist. Customer behaviour and sentiment, for instance, tends to be better understood by commercial banks as there are fewer clients, so the need for big data isn’t perhaps so large. The relationships with clients tend to be more direct with commercial corporate banks having regular and detailed interactions with their customers. The case for big data here tends to extend beyond the direct customer into the broader understanding of the business financial climate. So in effect we are looking to the behaviours of our customers at Barclays, and using that as a way of tailoring our products and services.
The last user case I’d like to examine concerns a pure cost benefit analysis. As mentioned earlier big data is predominantly dependent on low cost commodity hardware, so where possible moving away from the high end and traditionally expensive data warehouse-type technologies can provide significant financial benefit. Put simply operational savings can be made by deploying the technology.
Conclusions: Additional Big Data Opportunities Exist
While understanding our customer provides useful insight for both Barclays and its customer, it is possible to explore how new types of data - available in near-real time from existing internal systems or external data provides - can help to make decision-making data more accurate, whether that is for loans, trades or other transactions. Big data tools and techniques can be used on:
• Risk decisions (both credit and operational)
• Regulatory reporting submissions
• Fraud prevention systems
• Existing analytics, improving them with the addition of more data to enhance accuracy.
In simple terms the more you know about your customer, and your own business, the more informed your decisions will become.
Big Data Challenges and Skillsets for Banks
The down-side to a lot more data inputs into the decision-making process is being able to assess what is relevant, but this is not the only potential challenge:
• New technologies can only help so much and ultimately an analyst is required to interpret the results that are generated to establish if the new insight is sensible or relevant to a business process.
• The traditional analyst skills associated with understanding your business, generating models, testing them and applying the results are still relevant even in a big data world. The human factor matters, but the tools used to deploy these skills have changed.
• The dilemma faced by many organisations is to either hire new entrants that are big data technology savvy and teach them what a bank is, or alternatively to re-train some of the existing analysts to exploit the new data types and technologies. Neither of these approaches are quick or simple because the big data ‘data scientist’ needs to be a hybrid with skills from both the business and technology arenas. Communication skills matter too.
Fortunately the big data technology companies and the traditional structured data companies are also aware of this skillset and implementation approach challenge to adopting big data more widely and are keen to do something about it. They are introducing solutions that allow you to bridge the gaps between the existing analyst communities and the new technologies. Examples of this are Hive - a SQL database-like interface over Hadoop and SQL-H - which gives you the ability to query Hadoop from a relational database and incorporate the answer set in a traditional SQL query. These tools can help you deploy big data effectively.