How streaming can help developers improve their handling of data [Q&A]
Data is increasingly important to companies, but making effective use of it presents a number of challenges.
NoSQL database company DataStax has recently launched its new Astra Streaming service. Based on Apache Pulsar this aims to make it easier for developers who want to run their application streaming alongside their database instances like Cassandra.
We spoke to Chris Latimer, vice president product management at DataStax, to find out how this can help teams to implement their applications in the same way for different components.
BN: We know data is more important to businesses today, everybody tells us that. But how do companies go about building a data architecture that helps them achieve their goals in practice?
CL: The biggest challenge around data today is not the individual tools involved, but how to get them integrated effectively at scale. For companies to use data more efficiently, you have to get all the different components working together in production, so that you get the right output from your data at rest and your data in motion.
The best companies in the world look at this from a holistic perspective -- how can they get their application and infrastructure teams working in the same way, and making scaling up easier? They look at the entire open data stack, rather than at individual components.
The open data stack is built on three areas: the data technologies used, such as databases and streaming, to store and process data in motion and at rest; the use of modern application deployment and orchestration methods like Kubernetes to deploy on cloud and/or on-premises infrastructure; and then access via APIs so that developers can use and interact with data independently.
All the individual components for this exist today, so that is not the challenge. Instead, it is how to get these components working together effectively in production, and ready to grow alongside the volume of data that enterprises generate today. Getting this stack approach in place helps solve those problems and makes it easier to scale up around data alongside your applications.
BN: Application streaming data seems to be growing in importance as part of all this, but what impact does it have on people in their everyday lives?
CL: Today, we want to get responses from our applications that are personalized and relevant to us, and we want those responses in real time. For example, we might want to get real-time offers while we are shopping, so we get the best deals. We might want the best game experience, with real-time responses to what we are doing. Or we might want more interactive services from government organizations that we interact with.
Equally, companies can use event streaming to support responses to customer activity such as detecting fraud, or interacting with data from things like Internet of Things devices. Event streaming can be used to detect certain states and respond to them. What we want -- and what companies want to achieve -- is to get those responses completed with zero latency.
BN: What problems might arise if one part of your developer team takes a different approach than another?
CL: If you have different processes or approaches in place around one element of your data compared to another, it adds an overhead to what takes place around your data. For instance, you can have one element that scales in a horizontal way by adding nodes to its clusters without needing any downtime. Another element may add nodes to expand, but need some downtime to reorganize the data so that it is evenly spread across the cluster.
This difference can slow down your team, and mean they have to think about infrastructure and capacity in advance, rather than concentrating on their applications. Alternatively, it can lead to you spending more upfront to avoid the problem, rather than increasing your spend as you scale up.
BN: How should the industry as a whole approach this?
CL: The biggest challenge here is the level of expertise and specialist knowledge that data projects and strategies require. Looking at that open data stack as a whole can help to solve some of the problems around how companies can be more successful around data.
What this will encourage is more standardization. Standardization is necessary for us all to progress around data science, but it’s not just about architectural hygiene. New approaches to applications like Kubernetes make it easy to standardize how your application will work in the cloud, and this same approach can be used for connected data architectures too. This standardization around an open data stack is needed to unlock how more organizations can pick up and use data effectively.
Image credit: agsandrew / Shutterstock