Enterprises need to step up metadata management now
It’s no exaggeration to say that data drives business today. Organizations are literally flooded with data on all fronts, particularly as they accelerate their digital transformation and cloud migration. As data proliferates, it’s increasingly difficult to manage. That’s where metadata comes in. Though metadata is often described as "data about data," it’s actually much more than that.
Metadata is generated whenever data is ingested, accessed by users, moved around an organization, integrated or augmented with data from other sources, profiled or cleaned and analyzed. All this information creates the context for other data elements, providing a complete picture of the data. This holistic view makes it possible to organize and locate data, to understand what it means and to maximize its value.
The insights provided by metadata serve as the foundation for making smart decisions and developing sound strategies. Besides powering business intelligence, metadata also enables organizations to cope with the growing roster of compliance, regulatory and privacy requirements.
But you have to manage metadata properly to realize its many benefits. The most basic management tactic is tracking technical metadata, but that’s just the starting point. There’s a lot more involved than that, as the exploding demand for metadata management tools highlights. Sales in this sector are growing at more than 20 percent a year, and projections are that the market will hit an estimated $36.4 billion in 2030.
Here’s what you need to consider when devising an effective metadata management strategy.
Understanding types of metadata
To manage metadata properly, it helps to have a basic understanding of the various types with which you’ll be dealing:
- Business metadata categorizes the key figures and information needed for business processes, mapping data to business terms, glossaries, data domains, KPIs, reports and so on.
- Technical metadata describes data formats, structure, models and types. It covers attributes such as physical database schema, mappings, runtime statistics, volume metrics and more.
- Operational metadata indicates how data is used, who’s accessing it, and how often. It encompasses everything from user ratings to traffic patterns, sharing and archiving rules, and audit results.
Here’s a simple example to illustrate what this means. If you were cataloging a music collection, for instance, you could capture business metadata such as the name of the album and the artist and the year it was released. Technical data would indicate the music format, whether it’s MP3, FLAC or DSD. The operational metadata would show the source of the music, such as a CD or a streaming service such as Spotify.
Three primary goals
Managing all this metadata involves three specific goals: collect, manage and discover.
- Collect: The collection process encompasses all enterprise systems, both in the cloud and on-premises. That includes everything that houses data -- databases, file systems, analytics, integration tools, etc.
- Manage: Properly handling metadata involves data views with glossary terms, concepts, relationships and processes. This documentation prepares the metadata for use in the business context. User feedback such as ratings, reviews and certifications can indicate how useful the dataset is.
- Discover: The goal of discovery is establishing data relationships and building data lineage, a process that should be automated with artificial intelligence tools. Automated algorithms, along with AI and user feedback, keep the metadata updated.
Catalog for control
An enterprise needs an effective management platform for controlling and leveraging metadata in order to achieve these goals. The most popular tool is a data catalog that incorporates a business glossary and a data lineage system.
A good catalog provides an inventory of data assets, which organizes and tags assets so users can discover the data they need. Search and find functionality is one of the most important features of a catalog. In addition, an effective catalog illustrates the quality and relationships of various data assets. This makes it possible to understand the assets’ positions in the overall data picture and move them through the pipeline.
The business glossary houses the definitions of business terms and other information that’s significant for business users. Think of it as an FAQ of sorts, explaining, for example, what "days past due" means and how that’s calculated.
Finally, data lineage demonstrates how data flows in the data environment -- where it came from (its source), where it’s flowing (its destination), how it might have been enriched along the way and what other assets derive from it. Data lineage is essential for meeting regulatory requirements for tracing calculations and preparing data.
The right tools help
Metadata management is a complex process but one that’s well worth the effort. It’s the basis for business intelligence applications and enables enterprises to evaluate and optimize their processes. Organizations that don’t understand the importance of metadata may not be able to determine what data they have, whether it’s complete and current, and how it relates to other information. As a result, they can’t rely on their data and may wind up expending time, money and resources to reexamine it or even duplicate it needlessly.
By implementing the right solution, organizations can get the most from their data -- using it to make smarter decisions, improve revenue and achieve strategic goals. When choosing a management platform, it’s necessary to evaluate the degree of automation it offers for harvesting and classifying data, integration capabilities, collaboration options and data protection features, among other things.
As the data environment grows ever more complex, there are strong indications that metadata will become an essential component of data mesh and data fabric, as well as data catalogs, data governance and other data enterprise systems. Metadata can literally change the way organizations use data and greatly enhance their viability and success in our digital world.
Image credit: agsandrew/ depositphotos
David Kolinek is vice president of product, data governance for Ataccama. Unifying data governance, quality and management into a single, AI-powered fabric across hybrid and cloud environments, Ataccama lets businesses innovate with unprecedented speed while maintaining trust, security and governance of data. Learn more at www.ataccama.com.