Data science moves away from big data towards containers and cloud
Traditional Hadoop-style big data is giving way to cloud and container solutions like Docker, according to the results of a new survey.
The State of Data Science survey, carried out by Python data science platform Anaconda, among over 4,000 of its users, shows that Docker now makes up 19 percent of data science platforms, beating Hadoop/Spark with 15 percent and Kubernetes at 5.8 percent.
This is partly due to the fact that what was 'big data' in 2005, when Hadoop began, now easily fits into a single server's memory and there is now a wide range of alternatives to building a Hadoop data lake.
Among other findings are that Google Cloud's data services outrank those of Amazon Web Services and Microsoft Azure. Although Google Cloud is the third largest cloud provider, its focus on data services is paying off with the Anaconda community.
Anaconda is also gaining popularity with software developers (15 percent), in addition to data scientists (16 percent) and academics (16 percent). It matters a lot to users that Anaconda is free, but not so much that it’s open source. Free was ranked the most important attribute, while the open source licensing came second to last.
"The Anaconda Distribution is the data science community's de-facto platform for data processing, visualization and machine learning/AI. The survey shows that data science is undergoing a shift away from traditional big data (Hadoop/Spark) towards cloud-native technologies such as Docker containers, Kubernetes and API-driven applications," says Mathew Lodge, SVP products and marketing at Anaconda Inc. "We're also pleased to see more software developers using the Anaconda platform as machine learning is becoming pervasive and will be integrated with every application."
You can read more about the findings on the Anaconda blog.