In this article, we will cover:
Data is a critical asset for any business. We know that. It doesn’t matter the size of an organization—large, medium, or small—its data is essential to making business decisions and to remaining competitive. We also know that as the volume of data continues to grow, companies need to make managing their data a priority if they want to understand what has happened in the business, answer questions about why it happened, and make informed decisions going forward.
Data management needs to be part of the overall business strategy so that everyone in the organization understands data and uses it in the same way. But where do you start? There are three tools we recommend that will help keep you organized and will enhance your data management strategy: a business glossary, data dictionary, and data catalog.
All three tools—business glossary, data dictionary, and data catalog—can help an organization better manage its data. Here’s a list of pros and cons for each.
Although they are related, these tools are in fact very different tools that your organization can use for different purposes. In this blog, we will define all three—business glossary, data dictionary, and data catalog—and discuss what’s needed to build and govern each, as well as pros and cons to consider.
A business glossary contains concepts and definitions of business terms frequently used in day-to-day activities within an organization—across all business functions—and is meant to be a single authoritative source for commonly used terms for all business users. It is the entry point for all organizations that have any kind of data initiative in play. A business glossary is the red thread that connects the business terms and concepts to policies, business rules, and associated terms within the organization. When creating a business glossary, you should have:
Although you do not need to have a data governance program in place to build, use, and maintain a business glossary, you should still have a governance strategy for the business glossary itself. In order to have cross-functional consensus, you need stakeholders from all business functions whose responsibility it is to meet regularly to discuss terms and concepts that might overlap departments. This will allow for approval and documentation of definitions, which is important, especially if two departments define the same metric differently. It’s fine to have two different definitions so long as the stakeholders have verified that it is an acceptable deviation, and it is documented and made accessible for the business users who need it. In some cases, you may have a tie-breaking decider—such as a CEO—choosing one definition over the other.
Once the business term or concept is defined and approved, the designated stakeholders need to ensure that definition is used consistently throughout the organization. A business glossary is a key artifact for any data-driven organization and will help in setting up future data initiatives as the company’s analytics needs mature. Here’s what to consider when creating a business glossary:
As stated earlier, a business glossary is the starting point for any data initiative, but it also a pre-requisite to building a data dictionary.
Alation’s Business Glossary enables the creation of definitions, policies, rules, and KPIs through a rich, user-friendly interface. A business glossary can be initiated with Microsoft Excel or Google Sheets to get the process started and ensure that it’s working properly. Photo: Alation
A data dictionary is a more technical and thorough documentation of data and its metadata. It consists of detailed definitions and descriptions of data dimension and measure names (in databases, data tables, etc.), their calculations, their types, and related information. Whereas with a business glossary you provide definitions for terms and concepts, in a data dictionary, you provide information on the type of data you have and everything that is related to it. This information is most commonly useful for technical users that work on the backend of your systems and applications so that they can more easily design a relational database or data structure to meet business requirements. When creating a data dictionary, you should have:
Unlike a business glossary, a data dictionary will likely require you have a more formal data governance program in place with a governance committee made up of individuals from both the business and IT side.
The business team should be responsible for requesting changes to a metric’s definition, while the IT team should be responsible for implementing the change and communicating it with the organization. Establishing lines of communication between the two groups will promote trust. Here’s what to consider when creating a data dictionary:
A data dictionary is a subset of a business glossary, but both are required to build a data catalog.
Whether your data is stored in a data warehouse, data lake, or lakehouse, running dbt docs will propagate table and column definitions to create an automated data dictionary. Source dbt
A data catalog is the pathway—or a bridge—between a business glossary and a data dictionary. It is an organized inventory of an organization’s data assets that informs users—both business and technical—on available datasets about a topic and helps them to locate it quickly. Users have a clear, accessible view of what data the organization has, where it came from, where it is located now, who has access to it, and what risks or sensitivities may be involved—all in one central location. When creating a data catalog, you should have:
In terms of governance, you should follow the same structure as with a data dictionary. However, you should have another committee—a subset of individuals—made up of individuals who have both technical and business competencies that work alongside the data governance committee set up for a data dictionary. The best way to maintain a data catalog is to integrate it as naturally as possible, or intuitively as possible with existing processes put in place, such as whenever a new data source is added, updating the data catalog should be part of whatever process is in place for doing that job.
Here’s what to consider when creating a data catalog:
A data catalog is an organized inventory of data assets and provides knowledge of all aspects of metadata. Users can access a data catalog without access to the data asset itself. This helps in saving time and improves employee productivity, as well as, promoting transparency and trust in the data.
Although the terms—business glossary, data dictionary, and data catalog—sound similar, they play very different roles within your organization. Each is valuable, but not completely necessary for each organization—at least not right away. It depends on where you are at with your analytics maturity and how much time and resources you have to dedicate to build and maintain each artifact. As you consider your options, start with:
Thank you. Check your email for details on your request.