Enterprises struggle to manage big data flows
Enterprises of all sizes are facing challenges on a range of data performance management issues from stopping bad data to keeping their data flows operating effectively.
This is a finding of a survey by data performance management specialist StreamSets which finds that nearly 90 percent of respondents reported flowing bad data into their data stores, while only 12 percent think themselves good at the key aspects of data flow performance management.
Data quality is cited as the most common challenge when managing big data flows (selected by 68 percent). In addition to bad data flowing into stores, 74 percent of organizations report currently having bad data in their stores, despite cleansing data throughout its lifecycle. While 69 percent of organizations consider the ability to detect diverging data values in flow as 'valuable' or 'very valuable,' only 34 percent rated themselves as 'good' or 'excellent' at detecting those changes.
Areas where respondents felt weakest are performance degradation (44 percent), error rate increases (44 percent) and detecting divergent data (34 percent). The only measure where a majority (66 percent) felt confident about their capabilities was detecting a 'pipeline down' event. What's common across all performance indicators though is the gap between the respondents' self-reported capabilities and how valuable they considered each competency.
The survey also identifies problems caused by data drift -- unexpected changes in data structure or semantics -- 85 percent say this has a substantial impact, and 53 percent report that they have to alter each data flow pipeline several times a month, with 23 percent making changes several times a week or more.
What's also interesting is the continued prevalence of hand coding, with 77 percent using it to design their data pipelines. Two-thirds also use legacy ETL (Extract, Transform and Load) and data integration tools.
"In today's world of real-time analytics, data flows are the lifeblood of an enterprise," says Girish Pancha, CEO of StreamSets. "The industry has long been solely fixated on managing data at rest and this myopia creates a real risk for enterprises as they attempt to harness big and fast data. It is imperative that we shift our mindset towards building continuous data operations capabilities that are in tune with the time-sensitive, dynamic nature of today's data".
More information of the results is available in the full report which you can download from the StreamSets website.