How mismatched incentives create problems for development [Q&A]
Site reliability engineering (SRE), SecOps and developer teams are all supposed to be on the same side.
But mismatches in incentives between these groups can lead to challenges surrounding how and what information is shared across siloed teams. This creates a hazard where one team can shift deployment risk to another team, with no accountability back to the originating team.
We spoke to Nick Heudecker, senior director of market strategy at observability specialist Cribl find out more about the problem and how businesses can address it.
BN: What are some examples of mismatched incentives between SREs, SecOps and app developers?
NH: An incentive motivates you to do something. For application developers, the primary incentive is shipping code and adding features quickly, and doing so at a reasonable level of quality, meaning a low bug count. Doing that means adopting new application architectures and infrastructure, like microservices and containers. There may be other incentives as well, like mentoring or training others, but the core function of a developer is to develop.
Site reliability engineers have different incentives. They need to keep systems up and running at a high level of performance and efficiency and are incentivized accordingly. The incentives for SecOps teams are around risk reduction and breach mitigation. The architectures and infrastructure that application developers favor because it supports their incentives, complicates the lives of SREs and SecOps teams because they obfuscate information.
BN: How do these incentives contrast with each other -- what happens when teams aren't aligned?
NH: The developer incentive to quickly deploy new features conflicts with the SRE incentive to maintain stable, efficient, and high-performance infrastructure. This also conflicts with the SecOps incentive to mitigate risk. The conflict exists because SREs and SecOps teams don't have perfect visibility into what developers have changed.
A fresh code push might only include minor changes and be harmless to existing operations. Or, it could replace large chunks of logic across the entire codebase, including adding calls to external and third-party applications.
To the intrepid and overworked SRE and SecOps teams, one change looks like any other. They don't get a peek behind the curtain because of the mismatched incentives. Developers want to deploy quickly. They may have an executive or product team leaning on them for new features supporting a launch. Waiting on approval from other teams slows down deployments, so comprehensive reviews don’t happen.
This doesn't mean DevOps teams are intentionally trying to sabotage partner teams. They're simply acting in their own interests based on their incentives.
BN: How does all this lead to unbalanced risk sharing in IT? What are the potential security implications?
NH: The challenge is one party, the developers, have more information than other parties. That information asymmetry is what creates unbalanced risk sharing. Coping with information asymmetry has led to all kinds of new collaborative models, starting with DevOps and evolving into DevSecOps. I've even seen creations like BizDevSecOps, whatever that's supposed to be!
BN: Can you give us some examples of true collaboration between these three groups?
NH: True collaboration has been hard to come by. Early DevOps efforts designed to ingrain the necessary collaboration are often successful. The challenge is building collaboration at scale. Scaling beyond five to seven teams is difficult because teams lack the breadth of experience in IT operations or the SRE capacity to staff multiple product teams. The change velocity developers can achieve is often far greater than SREs and SecOps can absorb, making information asymmetry worse.
BN: Where does observability come into play, and how does it help build symmetry?
NH: Observability practices, like collecting all events, metrics, traces, and logs, allow SREs and SecOps teams to interrogate applications about their behavior without knowing which questions they want to ask ahead of time. Ideally, this breaks the information asymmetry problem between developers and other teams. However, observability only works if applications, and the infrastructure they rely on, are instrumented.
This creates another problem: who does the instrumentation? The expectation is application development teams embed instrumentation into their code as part of the development process. While a nice idea, there are four reasons this falls short.
- First, the quality of instrumentation varies. Many log statements are terse and only understandable by the developer that wrote them. Log messages are often terse, often understandable only to the developer that wrote them.
- Second, instrumentation libraries vary by implementation, giving inconsistent results across language bindings. OpenTelemetry is trying to improve this, but its progress is slow and still requires developers to do more work that, if we're honest, doesn't benefit them. It benefits SREs and SecOps, and now we're back to those pesky mismatched incentives and information asymmetry problems again.
- The third problem with instrumentation is the volume of data. Each instrumented application can produce terabytes of data each day. When you have robust instrumentation, the amount of data can be overwhelming, and extremely costly to analyze and store.
- Lastly, instrumentation is isolated to the code your team wrote. That represents a fraction of the code you rely on. Vendor-provided services and APIs remain a black box, limiting your observability into those components.
Operations teams need instrumentation without having to go back to developers and beg them to add it to existing code. They need to turn it on and off as needed, and they need readily consumable data. They also need every piece of data they can get, including packet payloads, insight into encrypted data, and so on. This goes well beyond what's possible with today's instrumentation options.
AppScope, a newly released open-source project, is a new take on instrumentation. AppScope interposes itself between application threads and system libraries, tracking things like file system access, network, and HTTP activity, as well as CPU and process activity. It also provides payload data, and because it sits between the application and encryption libraries, it also gives access to users' cleartext data. Because it works with any Linux binary, SREs and operations teams can instrument anything, even code they didn't write.