How DevOps can better communicate with the business

DevOps

Business leaders rely on business intelligence. This is underpinned by the hard work of their DataOps teams who ensure the complicated connections of the data science BI processes keep flowing. Given the centrality of data to the whole process, the technical team cannot be expected to manage this without getting business leadership to first understand just what effective DataOps entails. In failing to do so, friction grows as strategic direction meets the physical limits of technology, time, skills, and budget. The fundamental promise of big data and data processing is to provide organizations with the insights to make more intelligent decisions. Yet, only infrequently do we see organizations focusing on how they can collect their data as much as they focus on how they can act on it. In other words, they don’t give due consideration to the data they have about... well... collecting data. This appears an oversight given how much of a game-changer these insights can be in terms of operations, reliability and optimizing resource use.

Before diving into how we can collate and make use of this data, however, it is instructive to look at what kind of inputs can be collected on data processing and storage. These, if used properly, can drastically improve processes and improve workload management.

A good place to start reviewing existing metrics-gathering processes on your data processing is actually, culture. While this may sound like a strange place to start, methods for collecting this data are typically intrinsic to the teams charged with data gathering. To increase metric coverage, shaping this culture within your organization is the best way to ensure that you are generating the insights needed. An easy place to start is encouraging teams to deal with data piece by piece. The amount of data produced by organizations -- even small ones -- is considerable. In fact, it may seem overwhelming. Yet it is important to remember that it does not all need to be dealt with at once. Instead, prioritize what data will best drive the outcomes most important to your organization. These will generate the most value and from there, more considerations can be given to other data types. Another consideration is trying to keep the process as simple and transparent as possible for developer and analyst communities. In organizations where the process has become convoluted, these teams can get lost and gaps will emerge in data collection. One easy measure for simplifying the data collection process is to make it a requirement for deploying jobs into production.

While reviewing your organization's data gathering process, it is also an opportune time to ensure that data processes are still in alignment with the law. Data protection and privacy should always be a top consideration for organizations, and a number of laws in the past few years have reflected this. Europe’s General Data Protection Regulation (GDPR) and California’s consumer protection laws are perhaps the most memorable. While the stipulations made in these laws are numerous -- too numerous for the purposes of this article -- they revolve around several simple premises. Succinctly, they expect that organizations:

  • Know what personal data they are collecting
  • Know where personal data is being stored
  • Know what insights are being derived from personal data
  • Know what internal/external groups have access to personal data
  • Are able to mask personal data
  • Are able to inform the customer what data they have
  • Are able to remove a customer’s personal data if requested

Fortunately, in reviewing the different metrics that your organization should be collecting, many of these requirements will naturally become available. So what data should organizations prioritize? Clearly some data provides greater value than others and with so many metric points to collect, these are the ones that should receive focus. Perhaps the easiest metric to collect is information on when jobs are running. It should be relatively straightforward to look at what is running when and who requested that job to run. Another simple metric is privileges. What permissions do uses have, what code or SQL is a job running, when was it started etc. These sorts of metrics provide a more detailed picture of what data processing is occurring in an ecosystem. Yet even these fundamentals are unlikely to be comprehensively addressed by most organizations.

From here a logical next step is job execution. With a clear picture of what jobs are running, information on start and end times, length of executions, inputs (data sources for the job) and outputs (data destination information) is key in developing data lineage. These sort of insights helps answer the aforementioned legal questions that arise due to data protection laws. Data lineage also creates significant value when it comes to performance and operations. For those organizations looking to really improve their performance and operations, they will need to capture job meta information and data about going in and out of jobs. This means that data teams will know what to do in case of failure or slowness or data quality checks. This data collection information is more focused on job resource utilization and job optimization.

To conclude, to generate the business intelligence that your organization needs, due consideration needs to be given to what metrics you have. Focus should be given to the metrics that will deliver the most value in improving job performance. Considering the improvements this will provide to operations, reliability and optimizing resource use, this should be a no brainer. This will fast track organizations to generating the data insights they need to inform their business intelligence.

Image Credit: Sergey Nivens / Shutterstock

Shivnath Babu is co-founder/CTO at Unravel (and Adjunct Professor of Computer Science at Duke University)

Comments are closed.

© 1998-2024 BetaNews, Inc. All Rights Reserved. Privacy Policy - Cookie Policy.