“People say data infrastructure might not be sexy, but get it right, and you’ll go far”

By David Beach | 11 October 2017

Billy Bosworth, CEO of DataStax, sat down with bobsguide to talk about the past, present and future of data challenges and how a bit of forward thinking can stave off the worst of regulatory compliance.


How did you come to work for DataStax?

25 years ago, I started out with a computer science degree and, for the first half of my career, I specialised in client server relational database systems, and then hardcore application development, mostly for companies not vendors. That company perspective showed me what it was like trying to get your head around new technology. My career then morphed into Product Management, which essentially involved taking the skills learned from the software side and applying some business sense to figure out how to guide and direct a product on the vendor side.

I then worked for a tools provider that made it easier to run all a company’s relational systems - Oracle, SQL Server, DB2 and Sybase. We could sense that there was a sea change about to happen in the data landscape so we picked several new technologies to develop and one of those was Apache Cassandra, an open source project. Through that partnership, I met the founders of DataStax and they asked me to join one year into the company in 2011.

Cassandra is an Apache open source project. It was designed to be a fully distributed database. This was rare back then because databases, by their inherent nature, did not distribute well. The design principle intends them to be monolithic in their core architecture. For a large part of our digital history, it was fine to keep scaling the technology in size, until the explosion of mobile. Mobile has really seen us transition to the internet instead of a LAN; everything changed and the scale changed.

We tried to make the old relational systems scale in this new world, and it just wasn’t working; you were just pushing a string uphill. Apache Cassandra was created from two development products, from Amazon, they designed Dynamo which was the architecture. And the other was Google, which was more to do with how to query the data which was called BigTable. They were the parents, if you will, of Cassandra. DataStax in its very beginning was the company that promoted and contributed the most to this Apache Open source project.

What technology is DataStax currently using in its products?

We’ve been through three eras as a company. The first era, 2010-2013, the landscape was furiously cloudy with so many technological options, so the first challenge was to make sure Apache Cassandra was considered a viable project. The second era, 2013-2016, was making sure the world knew about this technology. We realised that the Cassandra architecture was fabulous, and it was a necessary condition but not a sufficient condition for what companies needed out of their data. That led us down the path to adding Apache Spark, which is more for analysis; Solr, which was for searching large datasets with advanced indexing and Graph, which analysed the connections among entities as opposed to the entities themselves. This all comes together into one unified platform called DataStax Enterprise. This third era we’re currently in is about making that platform accessible to the mainstream.

The core of the DataStax Enterprise version of Cassandra is the geo-distributed, real-time backbone. This database can be accessed in five or six geographies, which is hard enough for many databases. The search and analytics technology is a little more commonly understood, but Graph is different and a little tricky to understand. Graph functions more like your brain, analysing connections and pathways.

How is this platform used in the financial sector?

We see it used in a lot of different ways. One example is ING, where the bank is increasing and improving the interaction that the consumer has with the bank for a better 360 view of the consumer. Another bank is Macquarie in Australia. They have a very enlightened way of viewing IT. They will tell you that Maquarie doesn’t compete with other banks, it competes with the last app a customer opened on their phone. They do this by providing rich, contextual interactions with their customers.

We have other banks who use us for more hardcore back-end things like ticker information. So if you’re looking at a global equity system, and you want to track all the trades from around the world, that’s a massive amount of ingestion, that would almost be akin to Internet of Things data, because it’s just a continuous stream of data.

Another example in the US is Intuit. The company’s Turbo Tax system is the self-filing mechanism for the vast majority of Americans, and all of that information is powered by DataStax Enterprise.

Is there a limit to our capacity to process the massive amounts of data in real time, not only from a technical standpoint but also from the point of view of needing a human touch?

From a technological perspective, we’ve got a long way to go before we’re at saturation point. The next step in our evolution, is finding the signal through the noise; working out which interactions are more important and valuing them accordingly.

As far as the human element goes, when we talk about engagement I don’t want us as a company to lose sight of the human customer, albeit aided by digital. To illustrate this, if a member of my family has to have surgery, I want to deal with a human doctor who makes my experience more pleasant by having all the medical data at hand already. It’s as much about the digital reality you bring to bear on a human interaction as it is a digital engagement with a human and we have a lot of room to go there.

The reason that I get excited doing what we do, and although people will say infrastructure isn’t sexy, the reality is that there is no room for improvement without the proper data structure underneath you. We allow companies to genuinely transform how they engage with their customers. We’re nowhere near that tipping point yet and even when we get the signal through the noise, it opens up a whole new realm of opportunity. The trickiest part is, and always will be, data; it has to live beyond the transaction, where should it be put, how should it be governed, how should it be stored? These questions are very important to us.

How prepared are you for new regulations in 2018?

We want to ensure companies have the right infrastructure such that, as new regulations come, companies can have the flexibility to act quickly upon them. For instance, Macquarie has architected their system such that it’s quite easy for them to open it up to other players come PSD2. If a company hasn’t planned ahead and 10x-ed the potential future load on their systems, that will slow them down immensely. Therefore, when we work with customers we really try and make sure they understand not just the problem of today but the problems of the future through forward design. Often times, this is the same with regulations. We can kind of see what regulations will be coming, they’ll be around data protection and privacy.

GDPR is another example, where companies have systems in so many silos that they have lost visibility of the entity called a customer. If you are in that situation, GDPR is going to be very troubling with you. Conversely, customers who have worked with us and have a good forward understanding will simply have to open up visibility when GDPR comes in. Those are a couple of examples where we see that a little bit of forward thinking in design will prepare you for most regulations because they aren’t hard to figure out.

Getting the architecture right is so critical to future planning.

In that case, do digital natives have an advantage over legacy companies in terms of digital adaptability?

This is an area that I’m super passionate about personally. Companies today have the choice to make key architectural decisions that can empower them with data autonomy and avoid the challenges of a rising data oligarchy that could (without prudent planning) control a high percentage of the world's data. Conversely, a world of enterprise data autonomy would allow many companies to burst with innovation and corporate growth - and that is something we at DataStax work hard to enable every day.

The advantage that the traditional companies have is that they have seen a lot of patterns come and go, so they have a different level of understanding of a customer base and they have some advantages in even their assets.

The human experience isn’t dead either. There’s going to be a lot of opportunity for traditional enterprises to be relevant but also thrive in this new world. People will want choice, and choice associated with things that they enjoy. Some of the brands are going to become smaller and more personal. I don’t want to chop anyone down, it’s a very different mentality, I want others to rise up. I think the oligarchy are setting the bar and companies like DataStax are equipping the traditional companies to meet that bar.