DataOps comes to the cloud
The movement of data into the cloud creates challenges for enterprises who still rely on traditional data integration software or single-purpose data import tools.
DataOps specialist StreamSets is launching new features that help companies efficiently build and continuously operate dataflows that span data centers and the big three cloud platforms -- Microsoft Azure, AWS, and Google Cloud Platform.
Features include development automation through a fully-featured dataflow designer that includes 'easy button'connectors for Amazon S3, Elastic MapReduce (EMR) and RedShift; Azure Data Lake Storage, HDInsight and Azure Databricks; and Google DataProc and Snowflake.
There's also elastic scaling of cloud, multi-cloud and reverse hybrid cloud dataflows via Kubernetes, new data drift handling that automatically reflects updates to source schema in cloud data services, and a new CI/CD framework for automating frequent changes to dataflows through repeated design, test, validate and deployment steps. New central governance of StreamSets Data Protector policies helps detect and deal with sensitive data too.
"Already the majority of our customers use StreamSets for cloud dataflows, and we see first hand their struggle to orchestrate end-to-end management of data movement across a growing range of on-premises and cloud platforms," says Arvind Prabhakar, CTO of StreamSets. "Our platform was architected as cloud-native from the start, allowing us to easily evolve with the market. Cloud drift-handling and CI/CD for dataflows are unique enhancements that help our customers on their journey from traditional data integration to modern DataOps."