Lessons learnt from over a decade of Hadoop
Back in 2006 Apache Hadoop emerged and soon began to revolutionize the nascent world of big Data. It’s one of the key factors that helped shape a new industry and -- with the cloud -- helped drive a raft of new consumer industries and business services.
But the data lakes of Hadoop became a challenge to manage, and many Big Data and analytical projects became more of a quagmire than a sparkling lake of truth. What’s more the number compute engines overpopulated. They were operationally complex and demanded specialized skills to maintain. Those data lakes became very disparate compute engines, sharing the same storage whilst they operated disparate workloads. It became a mess. Managing them with the tools available was no picnic.
Assumptions were made such as architecture for global scale internet companies at the time could be just as easily be implemented by 1000+ employee enterprise organizations across all industries. Moreover the complexity of operating Hadoop at non-web scale scale proved to be difficult and complex without massive engineering teams.
Effectively the Hadoop world was one of a fantastic kitchen offering the world’s best ingredients that demanded only the very best chefs to make them. Not everyone in the kitchen had access to those Michelin starred chefs.
What this situation has taught the industry is ease of use of technology ultimately overrides scalability as most organizations need ease of use technology vs global scale. What's interesting is that cloud native offerings address much of the ease of use without the compromise of scalability which continues to plague on-premise Hadoop.
Hadoop, formerly coterminous with data lakes, was not, sadly, the ideal platform for them. Given the varieties and volume of data that they often contain, other services that would allow data discovery and manipulation and mesh well with the scale of the data lake.
The rise of a new king: Cloud
The cloud has clearly begun to win the Big Data architecture battle over on-premise, and Hadoop. The cloud is seen as offering more agility and flexibility. Hadoop clusters are not seen as the technology of the future. Yet the cloud is also still a foreign country for many enterprises. They do things differently there, and what worked before ‘at home’ can fail, become too costly, or slow down business operations when inexpertly rushed into.
When HPE acquired MapR it indicated the end of Hadoop as a dominant force. Like many before it, the technology is becoming legacy. Cloud is becoming the ascendant platform to bring Big Data and analytic strategies to glorious fruition.
There has been a shift in the demand for the technology - the Hadoop mix of compute and storage is no longer in vogue. Cloud technologies are massive object stores. They spin up compute on demand and make use of virtualization.
Brute economics generally wins out over all other factors. It took a lot of skilled people to manage the on-premise Hadoop clusters that organizations relied on. But the major cloud offerings are seductively commercially priced. With pre-built distributed computing offerings they are vacuuming up the market and taking the control from users to the platform vendors. And it’s not that the cloud platforms are so desperately different from the old order that Hadoop represented, except that now the operation of the service falls to a cloud vendor and not on the shoulders of the user organization. And that reduction in time, challenge, and cost gives a big breather to the cloud-using organizers.
The old and the new, together
There will be organizations less interested in making the cloud migration. Indeed, cloud vendors don’t mind -- they provide solutions back into on-premises environments, such as AWS Outposts and Google Cloud’s Anthos.
With so much Hadoop investment made there will be a large number of organizations waiting as long as they can before moving to other platforms, squeezing their existing investments. Hadoop won’t disappear fast. But in time it will become another legacy technology - whilst the rest of the corporate world zooms forward, floating more serenely on their clouds -- enjoying a smoother ride to their data nirvana.
Photo Credit: is am are/Shutterstock
Shivnath Babu is co-founder/CTO at Unravel (and Adjunct Professor of Computer Science at Duke University)