By Marc Alvarez, senior director for reference data infrastructure, Interactive Data
It’s a phrase that’s been hard to avoid over the last year: ‘big data’. But what exactly does it mean? Big data describes a phenomenon that is becoming increasingly difficult for firms across a wide range of industries to manage. In the financial sector, it describes the ever-increasing growth in market data volumes, the analytics surrounding them, and the number of market transactions. Sheer numbers are, however, only a part of the issue that firms are facing. The data is also becoming referentially more sophisticated as firms link together a growing array of datasets, build historical databases that track any ownership or security changes, and confirm details of each item on a more granular level.
So why is big data making headlines now? The answer can be found in the growth in market data volumes and demands for its use and reuse that have outpaced even the most bullish expectations. In a single generation, the volume of market transactions has increased by several orders of magnitude, and is projected to push global spend on financial data alone to around $28 billion in 2012, according to Burton Taylor and its ‘Financial Market Data/Analysis Global Share Segment Sizing 2011’ report.
Take, for example, the average daily number of pricing ticks processed by Interactive Data. In October 2011, across all traded asset classes in North America, an average of 10.7 billion ticks were processed every trading day. That translates into about 19.3 terabytes of data per year. However, peaks can and do occur. For example, on August 8, 2011, over 26 billion ticks were experienced.
From an infrastructure perspective, that growth suggests the need for capacity to handle at least three times the daily average, which will boost costs measurably. Factor in that many firms purchase data feeds from several providers and one can begin to see the size of the problem.
With great complexity comes great responsibility
There are indications that firms are at last taking data management seriously. One visible sign of this is the growing number and stature of Chief Data Officers (CDOs). You need look no further than John Bottega’s recent move from the Federal Reserve Bank of New York to Bank of America for evidence that the CDO is increasing in importance and visibility within major firms.
Regardless of job titles, these teams face a daunting task. Firms have to maintain legacy data stores while also supporting multiple silos and sources of new data, and the challenge of integrating numerous formats.
While the availability of a multiplicity of data sources may be a boon for those creating multifaceted trading strategies or feeding voracious risk and pricing models, it amplifies the ‘big data problem’ for operations teams.
Firms need to normalise and analyse this increasingly granular information, separating the useful information from the noise. A data industry maxim is that firms will spend four to five times as much on processing and integrating data as they spend on the content itself. And this cost continues to grow.
At the same time regulatory changes are also forcing firms to source and report increasingly larger amounts of trade data, as well as to adopt higher-quality – and usually data-hungry – risk and pricing models. Investors are making similar demands of their asset managers.
The increase in the volume and complexity of market data is primarily driven by three overlapping factors: market evolution, regulatory requirements, and the increasing sophistication of pricing and risk models.
The growth of big data points to an important economic trend – namely increased connectivity and regulatory changes, such as the EU’s transformative Markets in Financial Instruments Directive, opening up access to additional markets, at a lower cost of trade. While share volumes remain fairly stagnant, the number of transactions continues to expand rapidly, due in part to the fragmentation of markets.
Another important driver is algorithmic trading. The volume of trades initiated by firms using these strategies is vastly higher than those placed by asset managers pursuing other types of strategies.
Investor demand for greater transparency from hedge funds and asset managers is another factor. Pension fund managers and insurers are increasingly using risk attribution and return attribution analyses to inform their allocation decisions and to monitor asset concentrations. All this requires data of a far more complex nature than simply adding more prices.
Under Section 727 of the Dodd-Frank Act, Over The Counter (OTC) derivatives market participants will be required to report swap data, including price and volume, as soon as technologically practicable after execution of the swap. Dodd-Frank forces hedge funds and many other investment partnerships to register as investment advisors, putting them under the supervision of the Securities and Exchange Commission (SEC). The SEC and the Commodity Futures Trading Commission have made plain their desire for these firms to disclose more quantitative information, despite the fact that the government currently lacks the resources to analyse it. That could change as the new Office of Financial Research, the data-gathering arm of the Financial Stability Oversight Council, expands its activities.
Regulators’ increased interest in vetting firms’ risk-based capital calculations, stress test results, Value at Risk (VaR) computations and other metrics will also boost demand for data to feed these exercises, as will regulators’ growing scrutiny of the inputs to firms’ Level 3 asset valuations.
The pricing models for Level 3 assets are not the only ones becoming more data-intensive. In the past decade, risk models have come to require greater quantities of more complex data. For example, stochastic Monte Carlo VaR calculations, which at one time took firms’ supercomputers all night to run, are now being cranked out much more frequently, thanks to increases in off-the-shelf computer power and a steadily growing library of high-performance software tools.
Other data-hungry models have come to the forefront thanks to the ascendance of the credit derivatives market. The so-called Merton or Firm Value models, once the province of Moody’s KMV analytics and a few others, are now widely available. Using them to price credit or determine appropriate hedge ratios often requires enormous amounts of equity data.
The challenges of big data
The common denominator behind these drivers is the need to acquire, manage, analyse and report on large datasets on as close to a real-time basis as possible. Big data, therefore, calls for the ability to adopt a real-time analysis and reporting approach that is both highly empirical and easy to monitor for risk purposes.
Put into economic terms, this trend points to some fairly predictable consequences. Significant investment is required in these competencies to set the baseline capability to compete and thrive in a world of big data. The challenge this presents is twofold: Yesterday’s technology almost certainly has a limited shelf life, and increased Research & Development is going to become increasingly important to capitalise upon big data opportunities.
In the capital markets, this means mastering the ability to manage and analyse content across the security master, corporate actions, real-time and time series pricing and customer data domains, while at the same time complying with new regulatory demands. A key approach to achieving this goal is to deploy the power of statistical and empirical methods across a much broader universe of datasets and business functions.
When looked at as a whole, this positions big data at a very interesting intersection between data content, technology, and analytical capability. It involves the pre-processing of data before a firm makes strategic decisions based on it. The pressure to keep pace and thrive in this environment is going to place a premium on having the right software and other infrastructure in place. For data suppliers this means being able to offer a far wider (as well as deeper) universe of content on demand and in a form that is easily consumed. The historical partnership between supply and application of data is going to become even more important and sophisticated as a result.
One impediment to complete optimisation of data management is the fact that it is rare to find a financial institution that relies on only one market data provider. Multiple data sources are used to support procedures such as validating vendor data based on a user-defined set of tolerances. While this is viewed as prudent and may be required by internal risk management or compliance departments – for occasions when the data from one provider seems suspect, or the provider has technical difficulties – having two or more providers makes managing the data that much harder.
The real challenge will be in deploying the right technology and analytical capability to produce actionable information out of these massive datasets, and doing it in a timely manner, repeatedly, intra-day. Those that solve this problem most effectively could find themselves with a significant competitive advantage over their more slow-footed rivals.