The biggest data security risk? Downloading data to a spreadsheet
These days, it seems like every time you turn around another company announces a data breach. At the same time, organizations spend millions on their data warehouses, security solutions, and compliance initiatives. But all of that spend can instantly be rendered useless by the everyday business workflow of downloading data to a Microsoft Excel spreadsheet.
Of course, business experts aren’t looking to circumvent enterprise governance practices. They’re just trying to get the answers they need to make better business decisions. And because they lack the SQL programming expertise or extensive training required to work with data directly in most business intelligence (BI) tools, they are often powerless to answer the questions raised in the last meeting or email. So they turn to what they know best: the spreadsheet.
Identifying the Problem
Data downloaded to a personal computer creates a multitude of problems:
- Once you extract data from a BI tool to a spreadsheet, the data team loses visibility. They have no idea how employees use or share data.
- The loss of visibility into the data means it’s out of reach for security and compliance oversight, making it vulnerable to misuse or hacking.
- The downloaded data is instantly out of date once exported from the warehouse.
- The extracted data almost certainly represents a subset of the complete dataset, meaning decisions are being made using only a slice of the available data.
The Lure of Downloading Data
If the risks of downloading data to an Excel spreadsheet are so clear, why do so many business experts continue to do it? Here are some reasons that business users opt to analyze data in spreadsheet:
- It’s easy. 62 percent* of people report using Excel because of its ease of use. Spreadsheets are familiar territory. Unlike BI tools, they don’t require specialized SQL coding expertise.
- It’s fast. Business experts don’t have time to fill out a ticket and then wait for the data experts to come back with an answer -- only to find that the answer, when it finally arrives, raises more questions, leading to more tickets, and more waiting.
- It’s flexible. BI analysis tools offer 'self-service' interfaces for business users, but these interfaces are minimal. They do not allow business experts to ask novel questions, or to follow their curiosity and explore problems in creative ways.
Measuring the Impact
It might seem like downloading data to a personal computer is a trivial issue. But real-world events suggest otherwise. The average data breach costs companies $3.86 million, according to the "2018 Cost of a Data Breach Study" -- up 6.4 percent from 2017.
Even the most innocent mistakes can cost millions. In 2016, a Boeing employee mistakenly emailed his spouse a spreadsheet filled with personal data -- including social security numbers and birth dates -- on some 36,000 other Boeing employees. As a result, Boeing had to offer each employee two-year subscriptions to Experian’s identity theft protection services. Based on Experian’s service costs, this one spreadsheet error likely cost the company somewhere in the neighborhood of $15 million.**
Then there’s the effect bad data can have on research and decision making. A study in Genome Biology in 2016 found that one in five genetics papers published in top scientific journals contained errors attributed to autoformatting mistakes in Microsoft Excel spreadsheets.
So how do you avoid these issues? Start by storing your data in the cloud. Even Microsoft’s suggestions for preventing security breaches in its Office products include "consider moving sensitive information and systems to a cloud provider" rather than storing it on a personal computer.
*Computing Curiosity Gap Study, Researchscape, 2018
**[NOTE: we calculated this value based on Experian’s stated cost of $19.99 per month for its service. ]
Rob Woollen is co-founder and CEO of Sigma Computing. Rob has over 20 years of experience building distributed and cloud systems. He spent 6 years at Salesforce.com serving as the CTO for the Salesforce Platform and Work.com and Sr Vice President, Platform Product Management. Rob holds a Bachelor of Science degree in Computer Science from Princeton University.
Jason Frantz is co-founder and CTO of Sigma Computing. Prior to founding Sigma, Jason was at MapR as the architect for their distributed database, MapR-DB, and worked at Clustrix on distributed databases and query optimization. Jason earned a Bachelor of Science degree in Engineering & Applied Science from the California Institute of Technology.