The risks and rewards of shadow AI [Q&A]

As with other forms of 'off the books' shadow tech, used by employees without company approval, shadow AI is a double-edged sword.

Cyberhaven Labs recently reported a sharp 485 percent increase in corporate data flowing to AI systems, with much of it going to risky shadow AI apps.

But it also offers enormous opportunities that companies simply can't afford to overlook. Instead of banning it outright, the focus should be on finding ways to safely enable its use while protecting sensitive data and staying compliant.

We talked to Nishan Doshi, chief product officer of data security lineage company Cyberhaven, to find out about the risks and benefits.

BN: What risks do shadow AI apps pose to companies?

ND: Shadow AI reminds me of the shadow SaaS problem from a few years ago. Back then, employees would adopt unapproved SaaS tools to do their jobs faster and more efficiently, bypassing IT and security in the process. The incentives for employees to share data with AI apps are no different today, as more people turn to AI tools to boost productivity or tackle challenging tasks. In fact, recent reports about the surge in corporate data flowing to AI tools highlight just how pervasive this trend has become, as employees increasingly rely on these platforms to get their work done.

But Shadow AI is different -- and more dangerous. With shadow SaaS, vendors often had a vested interest in keeping customer data safe because breaches could destroy their reputations and businesses. With Shadow AI, the dynamics shift. Data is no longer just something to protect; it's a core resource for vendors, often used to train their models and improve their platforms. That creates a troubling misalignment of incentives. Some vendors are highly motivated to collect, retain, and even exploit this data, which opens up significant security, privacy, and compliance risks. Just look at cases like the FTC's enforcement actions against Amazon's Alexa and Ring divisions, where data retention and improper usage practices have raised serious concerns.

This misalignment makes shadow AI much harder to navigate. It's not just about unauthorized usage -- it's about ensuring data is safe and used responsibly in systems where those goals may not align with vendor priorities. That's why simply banning shadow AI tools isn't the solution. Employees will still find workarounds, and companies will miss out on valuable opportunities to leverage AI for innovation and efficiency.

Ultimately, the key is enabling shadow AI safely. This means creating clear policies around its use, implementing tools to monitor and control data flows to AI platforms, and working with trusted vendors who prioritize secure data handling. It also means educating employees about the risks of sharing sensitive information with unapproved AI tools and providing approved, secure alternatives that meet their needs.

BN: Do you think AI will accelerate IP leaks and data theft across organizations?

ND: AI could absolutely accelerate IP leaks and data theft across organizations, and the reason is because these tools fundamentally change how users interact with data. Tools like Microsoft Copilot and Glean have evolved from simple chatbots into powerful agents that access vast amounts of company data. They pull information from internal databases, systems, and files, making data retrieval faster and easier than before.

Instead of searching for documents or running complex queries, users now ask questions and receive direct answers. This ease of access can instantly expose any gaps in data classification, access controls, or security policies that might have gone unnoticed before. If a system isn't locked down properly, sensitive data becomes just a question away from exposure.

The bigger problem is that most traditional access control, DLP, and data security tools weren't built for this kind of interaction. Companies will need to rethink their entire data security stack to ensure it can handle these new AI-powered workflows. Without addressing these challenges, organizations risk seeing an increase in data leaks, theft, and other security incidents.

BN: How can companies better protect themselves from these potential data leaks?

ND: Organizations could start by:

  • Gaining visibility into data movement: Implement solutions that provide real-time visibility into how data moves within and outside the organization. This reduces the burden on security teams and enhances collaboration with business partners to mitigate risks effectively.
  • Classifying data comprehensively: Extend data classification capabilities to cover a wide range of data types and use cases. Contextual data classification goes beyond content scanning to understand where data originates, who interacts with it, and where it resides. This contextual understanding is crucial for enforcing precise security policies.
  • Accelerating incident response time: Enhance incident response capabilities with a detailed, step-by-step analysis of events leading up to security incidents. This approach enables organizations to respond swiftly and effectively to potential data leaks or breaches.

The challenge is that existing data security tools traditionally focus on scanning content to identify data types. However, they often lack the context needed to differentiate between sensitive corporate data and benign information. For example, when an employee uses a personal AI account to debug code by pasting snippets, understanding whether it’s proprietary source code or open-source library material is critical.

That said, a new security category is emerging to combat insider threats to sensitive data. Data Detection and Response (DDR) combines and replaces legacy Data Loss Prevention (DLP) and Insider Risk Management (IRM) technologies and greatly improves on their capabilities. Research firm Gartner predicts that by 2027, 70 percent of larger enterprises will adopt a consolidated approach to addressing both insider risk and data exfiltration use cases.

BN: Can you explain what a 'Large Lineage Model' is and how it could help mitigate these growing threats?

ND: Unlike a Large Language Model, which predicts the next most likely word in a sentence, a Large Lineage Model, like the one powering Cyberhaven Linea AI, predicts the next most likely action or event to happen to a piece of data.

It's trained on trillions of data flows, tracking billions of data pieces within a company. For example, a CFO handling an executive equity awards spreadsheet is likely to send it internally to another finance team member via Slack, but highly unlikely to share it with a reporter via Signal.

Unlike traditional data security products that require security teams to build complex policies for every potential risky event, Linea AI automatically detects these risky events and takes action to protect company data.

BN: What kinds of companies are having these issues today and do you expect the problem to grow?

ND: We're seeing this problem grow across every industry. Pharmaceutical companies are looking for ways to protect their research. For manufacturing firms, it’s safeguarding product designs and production methods.

We're also seeing AI startups and Fortune 100 tech companies coming to us to protect their source code and model weights.

As AI usage increases and reveals more confidential 'dark data' that until now has been hard for employees to find, we’re seeing a growing demand for solutions that safely enable generative AI across various sectors.

Image credit: casarda/depositphotos.com

© 1998-2025 BetaNews, Inc. All Rights Reserved. Privacy Policy - Cookie Policy.