What is dark data?
As a society, and as businesses, we used to have a good handle on our data. We knew what it was, where it was kept, and we used it in a very formal way, treating it as something very precious. You could argue that a lot of it was always "dark," as it was locked in files and accessed only by those with physical proximity as well as the permission to use it.
Within organizations, data processing used to rely on very structured, defined data sets, but the rise of social data, the Internet of Things (IoT), machine learning and constantly connected devices has introduced a seemingly unlimited supply of unstructured data. It comes streaming in from multiple sources -- cloud data, device-driven data, social data, financial data, and everything in between...
What is dark data, and how much of a problem is it?
Dark data is all that wonderful data you have in your possession but aren't actually meaningfully using or analyzing for the betterment of the business. This may be actionable data buried in server log files, point-of-sale feeds, customer call logs or performance review records, or even web analytics. If it’s being recorded, actively or passively, but only used once, or not used at all, then it will fall into the category of dark data. Data can be "real" data, or even metadata, that can tell a whole new level of story beyond the day-to-day analysis. Clearly, unstructured data is very often relegated to dark data, too.
Where has dark data come from, historically?
Dark data is not consciously collected, it is a label applied to content the business generates or collects in the course of activity that somehow "falls through the cracks." This means that any data could become dark depending on the way the business gathers and uses it.
So dark data may come from all corners:
- Customers, recorded in calls, emails, forms, and as orders, covering relationships, addresses, purchases or conversations and interactions online
- Financial information from the business, or that which is gathered on or from working with partners, suppliers and customers
- Underused or employee data, or that from former employees
- Business research or customer surveys -- especially the raw data that may not all make it through to full utilization
- All types of unstructured information like business emails, presentations, or even meeting notes and whiteboard images
Why should organizations be interested in their dark data?
Businesses today view information as a strategic asset, and are investing heavily to collect data in all corners of their business. But collection is merely the tip of the iceberg. In order to drive tangible business benefits, organizations should naturally gravitate to considering, then discovering and using their dark data. If an organization is still in the early stages of their data journey, then perhaps considering their dark data is a step beyond where there should initially aim. It’s a sensible step for when they have consolidated the skills and tools needed to build expertise in solving their most pressing and current data challenges, and demonstrated the value in the approach.
For those organizations at the right stage to tackle their dark data, the possibilities are broadly similar to the reasons that they began a big data project in the first place: Within the data stores of the business lie clues and insights that will not have been used and delivered their full return on investment yet. The exact nature and value of the dark data will vary considerably between organizations, but according to IBM, over 80 percent of data is "dark" and unstructured. It estimates that by 2020 this will have risen by 93 percent. At some point, most organizations will have to tackle this problem, if only for compliance purposes, if not the promise of business benefits from deeper, untapped insights.
How have organizations been tackling it, from the days of paper to the days of IoT data?
Whilst every organization is different, with different skill-sets, and different data profiles, there are common best practices that benefit all users equally when it comes to managing dark data.
As with all organizational programs there needs to be a clear goal, clear activity owners, and clear accountability. All the functions taking part must understand what their role is and why the exercise has value to the business. This may require training and a culture shift as well.
Unchanged from before the digital age, best practice is to have a strong and universally applied system in the business. Once all parts of the business understand the system, capture the right data, use it properly, and apply a method to re-use it whilst it retains value -- then the data is not officially dark.
Of course you must adhere to data regulations, but also consider the value beyond them. Regulation should encourage minimal standards of excellence, yet show the way to truly first class behaviors which will elevate the business to a highly competitive posture.
Also, take the time to understand information content, type, size and location. For example, metadata holds its own stories and, when analyzed properly, may show the most surprising insights.
Are methods of accessing and using it changing?
Analytics within business have really changed a lot from the early days of digital when the IT department handled all data processing requests. Today, with the ease of use and widespread adoption of "self-service analytics" tools, just as every business user can word process or create their own presentations, there’s no reason why they can’t become a business analyst to a certain degree.
Being able to utilize data from all corners of an organization can be a key enabler to spot trends in your business, or your competitor’s business, you may never spot otherwise. Once data use is well established and users manage their data reliably, predictive analytics is the logical next step as it can provide a real boost to the effective operation of an organization, and suggest what you should be doing next to drive tangible business benefits. Basing decision-making and forecasting on good, reliable data changes the nature of the whole company, and imparts a new sense of confidence to its leaders.
With that kind of power, the power to "play" with data, those who make the decisions can better see the way to do so, to make efficiencies, to experiment and to iterate better.
Can you give a customer example of dark data being brought to the light?
Alteryx has a financial services customer who wanted to gain insight from its trading terminal data to find correlations between trading patterns and abuses like money laundering and other fraudulent activities. The customer, who is not a data scientist but a typical business user, was using spreadsheets and standard desktop databases to try to analyze portions of the data. But unfortunately the majority of the data quickly became dark due to its volume and geographically dispersed storage. After the customer was able to utilize what was previously underutilized, and completed the data prep and analysis process to determine suspect patterns in transactional records, they took that analyzed data and created sophisticated predictive models that can identify activities that indicate the potential for fraud, and take measures to prevent fraud before it occurs. This company went from having an unmanaged resource to one that allowed it to detect nefarious activity in advance.
How can the average (if there is such a thing) organization get the people who understand their data to use it better?
In a business that operates "traditionally," there might be two ways to encourage a new, data-led way of working. The most common is when a champion within the business with an interest takes it upon themselves to demonstrate the value they have already seen for themselves. They become an evangelist and show real value to their peers. Then the project snowballs and picks up momentum. Eventually, senior leadership are compelled to pay attention.
Secondly, if the company leadership are on board, they make it happen by demonstrating their own commitment and appointing owners and ensuring there is real resourcing and metrics against the project. Then the business conforms to the direction set from the top. We increasingly see this in the newer role of chief data officer. Organizations of all sizes can benefit from having a CDO, and those that have not come to that realization risk being left behind by those who have, and are fully leveraging their data assets to their best potential.
What’s the future of dark data? Will it increase, will we manage it better?
Like those IBM statistics indicate, with the increase in traditional data we will undoubtedly see an increase in dark data in lockstep. As with any area of business, there will be some organizations that leverage dark data better, and these will also undoubtedly be the ones who have already mastered their current data challenges. No business can skip general proficiency and the demonstration of value in the latter before becoming an expert in the former.
Certainly businesses in general will learn to better tap into their dark data, it’s the way things are moving in this everything-internet-connected and measurable world. The real fun and profitability will show itself in those business whom have opened their data resources (safely and responsibly) within their business so that all members of the workforce are empowered to be curious problem solvers in the style that their role requires.
Organizations, in effect, need to understand "data curation" and "data trust." The curation aspect refers to managing the whole data lifecycle, from initial creation right the way through to retirement -- and ensure that dark data either doesn’t become dark, or is found and utilized to best effect at the right time. The trust element comes in from a culture willing to put aside the biases and opinions that make up the way humans naturally think, and become led solely by the evidence of the data.
Some business have learned to balance data governance and self-service models to the point where data access has become data curation, ultimately across all corporate data sources, both traditional and "dark."
These will be the ones not only keeping the amount of dark data to a minimum, but continually fizzing with ideas, proactive experimentation, and as an unintended by-product, a great working culture, too.
Bob Laurent, product marketing, Alteryx.
Published under license from ITProPortal.com, a Future plc Publication. All rights reserved.