Data lakes and brick walls, big data predictions for 2017
There's been a lot of talk about big data in the past year. But many companies are still struggling with implementing big data projects and getting useful results from their information.
In this part of our series on 2017 predictions we look at what the experts think will affect the big data landscape in the coming year.
Steve Wilkes, co-founder and CTO at Striim believes we'll see increasing commoditization, with on-premise data lakes giving way to cloud-based big data storage and analytics utilizing vanilla open-source products like Hadoop and Spark.
He also believes that security concerns will mean that we'll see an increase in systematic data classification, encryption and obfuscation of all long-term data storage. "Streaming data preparation will become paramount as fast, critical enterprise applications increasingly require that data be filtered, transformed, aggregated and enriched before landing in an underlying data store," says Wilkes.
This importance of open source is echoed elsewhere, "In 2017 big data will begin to cross a chasm into the mainstream, in large part resulting from the popularity of Hadoop and Spark," says Kunal Agarwal, CEO of Unravel Data. "Companies will use big data for mission-critical needs when running their data stacks. These are the same companies that once had issues with the security threat propaganda that plagued Hadoop and Spark; that's now in the past. We have only touched the tip of the iceberg for what Hadoop and Spark are capable of offering when running mission-critical jobs on a high-performance big data platform".
Agarwal also sees a shift of workloads to the cloud, "In 2017 we will see more big data workloads moving to the cloud, while a large number of customers who traditionally have run their operations on-premises will move to a hybrid cloud/on-premises model. We can also expect to see companies using the cloud not just for data storage, but for data processing. And we'll see mainstream adoption of the cloud, which will give companies confidence in running their big data clusters in the cloud, and not just on-premises".
Using tools like Hadoop will help overcome problems with legacy applications. "Organizations trying to scale their existing BI platforms to big data size will hit a brick wall with legacy analytics tools," says Sushil Thomas, co-founder and CEO of Arcadia Data. "Research firms like Forrester have seen seen increasing interest from enterprises not only moving their data to Hadoop, but also running analytical applications on Hadoop clusters. Running BI natively on Hadoop allows analysts and business users to drill down into raw data, run faster reports and make informed decisions based on real-time data instead of abstracts".
Big data shouldn't be seen as an end in itself according to Anil Kaul, Co-Founder and CEO of Absolutdata. He predicts that big data applications will be subsumed into AI and used as an enabler. "The lack of focus on big data will let the field mature with only the serious players and result in much better business results".
2017 could be the year when data lakes will finally become useful according to Ramon Chen, CMO of data management innovator Reltio. "Many companies who took the data lake plunge in the early days have spent a significant amount of money not only buying into the promise of low cost storage and process, but a plethora of services in order to aggregate and make available significant pools of big data to be correlated and uncovered for better insights".
Chen thinks that up to now the challenge has been finding skilled data scientists who are able to make sense of the information, while also guaranteeing the reliability of data. "Data lakes have also fallen short in providing input into and receiving real-time updates from operational applications," says Chen. "Fortunately, the gap is narrowing between what has traditionally been the discipline and set of technologies known as master data management (MDM), and the world of operational applications, analytical data warehouses and data lakes. With existing big data projects recognizing the need for a reliable data foundation, and new projects being combined into a holistic data management strategy, data lakes may finally fulfil their promise in 2017".
Looking much further ahead, Jeremy Achin the CEO of DataRobot thinks that by 2032 big data will rely on state of the art prediction involving training an ensemble of locally-optimal models in real-time for each data point at the time of prediction. "These models are supplemented by millions of neural networks pre-trained on data so big that in 2016 it would have been impossible to imagine".
There's a good deal of consensus from experts on the continued rise of tools like Hadoop and Spark, and it seems that 2017 could be the year when big data starts to come of age.
Image Credit: Tashatuvango / Shutterstock