Is dark data valuable?

A tsunami of dark data is coming -- data that has never been analyzed, tagged, classified, organized, evaluated, used to predict future states, control other processes or has been put up for sale for others to use. So, what do we do with this data? First, we have to understand that exponentially more is coming. We see this in autonomous technology as vehicles generate four thousand gigabytes per day.

Also data is becoming more complex, as most of it is already in in video or other complicated forms. Seemingly free storage is encouraging people to store more and defer deletion.

There are three rules to follow when determining how to utilize this data:

Rule 1: Don’t be a data hoarder

In The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, Chris Anderson encourages hoarding, without a theory or other evidence that there’s value in the data. Don’t be fooled. Process enough of what you’re hoarding to determine if there are significant findings you can extract. If not, consult your lawyers on regulatory issues and then throw out the data.

Rule 2: Beware of False Positives

It’s easy to torture data until it seemingly confesses results that appear statistically significant. Run a thousand statistical tests on a large body of data and you’re bound to find thousands of "statistically significant" results: false positives that are truly meaningless findings.

There’s a parallel theorem to remember: The infinite monkey theorem. If you have an infinite number of monkeys hitting typewriter keys at random for an infinite period of time, will you discover that they created the entire works of William Shakespeare? Nonsense! The Wikipedia treatment of the theorem proves the futility of the exercise.

The "Real monkeys" section of the text says:

Lecturers and students used a grant to study the literary output of real monkeys. They left a computer keyboard in the enclosure of six Celebes crested macaques for a month, with a radio link to broadcast the results on a website.

Not only did the monkeys produce nothing but five total pages largely consisting of the letter S, the lead male began bashing the keyboard with a stone, and the monkeys followed by soiling it.

The lesson here? Don’t assume that there’s value in all data.

Rule 3: Data has both positive and negative value. Measure both and make a business judgement.

Why aren’t you already processing this "dark data?" Processing has costs. To extract value from raw data, you have to dig it out, clean it, sanitize it from a privacy point of view, process it and consume it and either use it internally or sell it to others.

Unless it’s like the monkey business cited above, data carries regulatory and competitive risks. Breaches, leakage and theft are common occurrences. Anonymization can be reversed, exposing privacy issues and your data in the hands of competitors can hurt you.

You have to process data in some way to determine value, cost and risk. Business people aren’t going to spend more until the value significantly exceeds the cost. Follow these three rules and you’ll determine whether there is enough value that warrants an investment from your organization in utilizing dark data.

Image credit: agsandrew / Shutterstock

Tom Austin is CEO and Founder of The Analyst Syndicate. Tom spent 24 years at Gartner Inc., with 17 prior years at Digital Equipment Corporation (DEC). He won numerous annual thought leadership awards at Gartner, helped drive early key movers, such as Adobe, to software subscription business models and has interviewed high-profile names at global events like Gartner Symposium, including Steve Balmer, Lou Gerstner and Eric Schultz. His headshot is attached.

One Response to Is dark data valuable?

© 1998-2024 BetaNews, Inc. All Rights Reserved. Privacy Policy - Cookie Policy.