How the real-time data gold rush creates steep learning curves for developers [Q&A]
By 2025, industry analyst firm IDC predicts that 30 percent of all data will be real-time. The avalanche of streaming data frameworks, libraries and processing engines has created a massive learning curve for developers.
We spoke with Craig Blitz, product director of cloud native application platform Lightbend to learn more about where we are in these early days of streaming data development, and how Lightbend's newly launched open source framework, Cloudflow, aims to support developers pursuing real-time use cases.
BN: Where is the average enterprise today on the streaming data adoption curve?
CB: At the organizational level, there's pretty universal consensus around the value of adding intelligence and real-time analytics to applications, and I think most organizations know that the use cases they are pursuing around AI and similar are not possible with older infrastructure. Some organizations are steeped in Hadoop big data infrastructure, while others missed the big data evolution and are starting from scratch. The industry is still very early on streaming data, and I think what's pretty common out there today is a lot of thrashing at the technology choice phase of the move to real-time systems.
BN: What's so hard about the technology choices?
CB: It's the diversity of technology choice at every layer. Where do I apply Kafka, Flink, and Akka? Do I continue down the road of using Java, or do I move to Scala? And where does Python fit in? All of these types of choices are overwhelming because you often don't understand the tradeoffs until there has been a lot of trial and error. At least the developers who come from the Hadoop ecosystem have some basic experience with Apache Spark, but for the rest it's hard to know where to even start.
From there, getting these frameworks integrated and scalable is a major challenge that very few developers have much practical experience with. The tools themselves are constantly being improved, and keeping up with those new releases is another challenge. For example, Spark recently introduced structured streaming.
And it's also worth noting that unlike Netflix or LinkedIn or those types of companies that were "born cloud-native", most enterprises are still largely on-prem, with mandates to move to the cloud. So at the same time they are dealing with new application architectures for streaming data, they have cloud migration issues, and new technologies like Kubernetes and Istio that they are wrestling with.
BN: What is Cloudflow and how will it help developers navigate streaming data?
CB: If you think back on technology waves like the earliest days of web applications, app servers played a really important role in codifying subcomponents to the application stack. Using app servers gave developers the confidence that things would work together well, and predictably under application load.
To date, streaming data has been the Wild West, and developers have had to become deeply familiar with underlying frameworks. That creates a large time requirement for understanding the frameworks, which distracts from what you can actually build with the frameworks.
Cloudflow is an opinionated framework that assembles all of these technologies and makes it much more consumable for developers. A typical application comprises several processing steps. Perhaps Akka Streams for ingestion, Spark or Flink for calculating data aggregations, Kafka as a buffer between processing elements, with data then sent downstream to HDFS, to a database, or exposed as an HTTP endpoint. By getting all of these components to talk to each other out of the box, we spare developers from having to figure out the lower level primitives, so they can work on the business logic of the streaming data use cases they are pursuing instead of figuring out how to run things and connect them together.
The philosophy behind Cloudflow is that we want to expose the full power of the technologies we integrate, but remove the operational complexity and boilerplate that’s a frequent cause of project delays. We do this by providing wrapper APIs to the streaming engines that allow us to handle deployment concerns while still allowing developers unfettered access to underlying native APIs. Cloudflow is the 'app server' that ties it all together.
BN: How fundamental is streaming data going to be to the application stack at the typical enterprise in, say, five years?
CB: Just think back to 15 years ago, when getting a loan meant going to a bank to meet a loan officer, who collected your data, and then called you back at some point. Then that application moved online, and competition between banks started to shift to who could respond fastest with the most competitive rates. Today you can’t even compete in the loan industry unless you can deliver instantaneous responses to loan applications.
We're seeing that sort of narrowing of time across every important business function in every industry. Machine learning models are increasing the intelligence embedded into these intervals, and every enterprise is looking for ways to shave time off of business outcomes.