The evolution of data and disparate systems
The weird thing about evolution is that it affects us even though we are deeply aware of its mechanisms and processes. There's something unavoidable and inexorable about it. While that's true of physical processes governed by natural selection, perhaps it's less true of human culture and technology. Or is it?
Over the long history of IT and its use in and by big business, we've seen constant innovation, sometimes incremental in progress but sometimes radically discontinuous. Consider the steady march of microprocessor performance in the former case and the sudden AI deep learning revolution for the latter. But in both cases what's happened before and what's happening now affect and influence what directions tech goes into tomorrow.
One constant over the course of IT since 1950 has been the centrality of data. In fact IT was first called "data processing" for a reason. Even though data has always been at the forefront, the nature of enterprise data has evolved, gradually at first and then more rapidly of late. It's fair to say that we've been in the midst of a veritable "Cambrian explosion" of enterprise data for the past 10 years. Initially our focus was on the volume and velocity of that data explosion. There's just so much of it and every day there's more of it at a faster rate than there was before.
But in addition to volume and rate of change, data types have been proliferating recently at alarming rates. We used to live in a world in which there was only one data model for the enterprise. In that world the relational model was king and SQL was its query language. Now we live in a world in which many data models exist and contribute value to the enterprise: relational, key-value, object-oriented, document, time series, and graph data models are everywhere you look. As if that weren't bad enough, it gets worse. We used to live in a world in which documents didn't really count as data, much less emails, or texts, chats, meeting records, transcripts, etc. Now everything counts as data in the sense that everything matters for analytics, compliance, reporting, decision making, and the like.
When thinking about enterprise data, everything matters and everything is, in principle, at least, connected. What we've learned is that with the right perspective on the relevant data, many heretofore hard problems become, in fact, pretty easy to solve. When you have the right data, you don't need fancy algorithms nearly so much as you need human attention, insight, and cleverness applied to data relevance, context, and connectedness.
So what's the problem? Surely we're entering a golden age. Well, not so fast.
When we look again at IT, we find disconnected data in isolated data silos; far flung and dispersed data archipelagoes; endless reams of incompatible formats, rates of change, and schemas. We've built networks of networks that literally span our globe and beyond. We've connected the systems into networks and the networks into an internetwork. Our systems can all "talk to each other", but what about the data?
In the world of data management, databases, and applications, we see only endless cantonization. If all the data about my enterprise is connected by virtue of it being about my enterprise, why do I have to query a dozen disconnected data silos to learn anything new? If it's all related, why do I lose valuable meaning when I switch contexts? Why is all of the data so disconnected?
To answer that question, we have to return to our old friend, evolution. Data management systems evolved in a world very unlike the world we live in today. They were all adapted to flourish in environments in which:
- Relational data is the king and SQL is its query language
- The king is jealous and will have no other data models or formats before it
- Data was structured and relational and rarely sparse
- Connections between disparate data types weren't crucial
- The rate of change was relatively slow
This is the world in which relational, SQL, data warehouses, SharePoint, Oracle, search engines, and MDM evolved into dominance. But conditions change, winners and losers emerge and re-emerge. That world is gone.
We need to think hard about what kind of data management techniques, databases, data management systems can flourish and thrive in the data environment that exists in the enterprise today. We may find, like Google, LinkedIn, and Facebook found, that a data management approach that focuses on connecting the relevant data together, no matter where it lives is the key to dominance in the world we've created and the world we will live in for the foreseeable future.
Kendall Clark is Founder and CTO of Stardog, an enterprise data unification company. Stardog partners with industry leaders like Morgan Stanley, Bosch, and NASA. Before founding Stardog, Kendall was a philosopher, an editor and columnist in the IT industry for O’Reilly Media, and an AI researcher at UMD. He chaired the W3C working group that produced SPARQL, the world’s leading knowledge graph query language Learn more about how Stardog is connecting data at www.stardog.com.