How to Create a Data Governance Framework: Best Practices and Key Components

Are you concerned about the security and accuracy of your organization’s data? You’re not alone. With the prevalence of data breaches and the potential risks of dark data, implementing a robust data governance framework is more important than ever.

This three-part guide will cover the significance of data governance, present a case study to exemplify its importance and define the essential elements of a data governance framework, encompassing data inventory, classification, discovery, and mapping.

The importance of data quality will be discussed, along with its integration into a data governance strategy. Implementing best practices can safeguard data from breaches and maintain accuracy and value for future purposes.


There is an immediate requirement for Data Governance Initiatives to determine how to secure data usage, manage activity, and gain visibility and control of one of your most important assets.   

Whether on-premises or in the cloud, more employees are using and sharing an ever-increasing amount of data throughout your organization. The key to this process is that you cannot slow down your organization’s ability to innovate and create impactful solutions while you need governance.

One key to success is having a set of Data Governance Principles so everyone understands how to use the data safely. This will help your team not to reinvent the wheel every time they start a project.

The key to success is to have a set of
Data Governance Principles so everyone understands their responsibilities.

Part 1: Why do we Need Data Governance?

Simply put, a data governance process is about managing data as a strategic asset rather than just collecting it. It involves ensuring controls around data, content, structure, use, and safety. Of course, the most prominent example of such data is the use and tracking of personally identifiable information.

As modern business data usage evolves, it embraces advanced analytics, artificial intelligence, and machine learning, which drive the amount, velocity, and variety of data in play. With all that data comes a wealth of new possibilities and a new set of challenges. Our primary outcome, which is essential here, is optimizing the management and governance of this ever-greater data

ResourceReference Creating a modern data governance strategy 

1. Data Breach Statistics: How big of a problem?

Hundreds of millions are at risk of identity theft or other harm due to recent large-scale data breaches of public and private entities. An important factor was that some of these companies kept information in poor hygiene. The need for a set of Data Governance policies, procedures, and technologies could not be more important.

Jennifer Kurtz’s article in the National Institute of Standards and Technology | NIST, 20 Cybersecurity Statistics Manufacturers Can’t Ignore | NIST (Feb 2020), includes a couple of important data breach statistics that drive the danger home.

  • An estimated 74% of companies have more than 1,000 stale sensitive files. (Varonis)

  • An estimated 41% of companies have over 1,000 sensitive files, including unprotected credit card numbers and health records. (Varonis)

  • An estimated 21% of all files are not protected in any way. (Varonis)

2. The Danger of Dark Data?

Sometimes, your organization may be unaware of some data being collected. Wikipedia defines dark data as data acquired but never really used to drive insights or make decisions. The real issue is that Dark Data flies under the radar in most organizations. This can cause issues later in the cost of storage and processing and provide an unneeded risk.

Need :Ability to discover and catalog data in your various environments.

3. Penalties if you do not get Data Governance correct – GDPR

The most dangerous aspect of your data being shared around your organization is that you are only an Excel data pull away from a data breach. Especially with regulations such as GDPR, where making mistakes can be costly regarding reputation and finances.

Not only a danger to the data but also to your business and reputation.   An IT Information Week survey reviewing cybercrime showed that 10% of breached small businesses shut down in 2019.

A Data Governance program helps minimize this danger by putting controls in place to help manage your data estate without blocking data solutions that will help you gain a competitive advantage in today’s business.

4. Data Governance Case Study

I have two of my “back-in-the-day” examples from my consulting life with clients that help illustrate what we are up against and help define the scope of the problem.

The Danger of What Data Might be Surfaced

I was helping a client enable search tools over SAN storage. There were rules that we used, as consultants, that helped make sure a client knew what they were getting into.  With so much data, just throwing it open to search, you don’t notice what you might bring to the surface. Security by obscurity never really works.

We had a couple of gotya searches we would take management aside and reviews, such as executive salary, layoffs, popular movies, images, and content from various file types. I always had something that shocked them.

Do I have control over how my Customer Data is used?

When getting into a more recent example on the data side, clients can be surprised about how many copies of their customer information they have, how out of date it could be, and the shock of what personal customer data is shared around the company.  The number of databases with this data would almost always be a surprise.

Lesson Learned:  Without a plan, you invite issues. This was usually the best way to start the governance discussion.

What is a data governance strategy?

A data governance strategy is an integrated approach to managing confidential business information that involves applying policies and procedures to your organization’s various data activities. It’s based on the belief that a company’s data should be treated as a critical asset and used to help improve business operations. Data governance is not so much about restricting access to information but ensuring that your organization has the right policies and procedures to protect data that may contain sensitive or confidential information.

Without controlling your data, it becomes Dark Data
Becoming a black hole waiting to cause issues.
Image by torstensimon from Pixabay

Part 2: Creating a Data Governance Framework

The important point to know about starting any data governance project is that you should not start from scratch. Like any project, there are resources you can leverage to get you started. This section provides various subject areas you can look at when creating your own Data Governance framework.

The following are some key points I like to keep in mind for most of my projects.

  1. Don’t try and boil the ocean” is important here.  The easiest thing to do is start with something small and grow into it.  You have to have little victories in a project to build momentum.

  2. You need executive buy-in as this project will involve many departments and functions. You will also need executive-level support to help move things along.

  3. You will not be successful unless you know who has ownership and can be held accountable for their portion of the data estate.  You need to define the organizational roles and responsibilities of the various team members you need.  

  4. Balance the focus on the process and the tools.  Having the best governance plan will not succeed if your users cannot find the information or if the process or tools are too onerous.

  5. As you move through the project, don’t forget to always think of the end goals on why you are doing this, such as;

    1. To improve the data quality

    2. To improve data management

    3. To make finding and using data easier

    4. To improve data security and compliance

    5. (Add any of your goals to the list) 

1. How can a General-Purpose Data Governance Framework Help?

If you are starting out, the best place to start is with a general-purpose framework and keeping the keys to success in mind from above.

A general-purpose data governance framework is a set of policies and processes that can be applied to most organizations. It may not be tailored to your organization’s individual needs. Still, it may be easier to start with and implement because it doesn’t require any significant changes to your systems or infrastructure.

Some companies use a combination of these frameworks alongside the more individualized approaches. However, because it is nearly impossible for a framework to address every need, you may still need a flexible approach to achieve the full benefits of data governance in your situation.

The Data Governance Institute(DGI)  (Linked-In) is an organization that provides vendor-neutral data governance and guidance. They have published the resource Data Governance Framework & Components, which provides a good overview and includes the following two whitepapers:

  • In-Depth: The DGI Data Governance Framework Download

  • How to Use The DGI Data Governance Framework to Configure Your Program

2. What is a Data Governance Framework?

A data governance framework is a great place to start when designing a set of policies and processes. This will provide guidance and be used to improve data quality, security, privacy, and compliance. Companies with poor data quality or security can suffer from many problems, including low customer trust in their company to handle personal information securely.

Data governance frameworks can be complex, but they are worth the investment because they will strengthen your company’s security for data collection, storage, and usage from internal and external threats.  This does not mean you cannot start small and build into your plan.

Your plan should consider the following items covered in the next few sections.

3. Set Up a Data Governance Center of Excellence (COE)

In many of the analytic projects I have been involved in, whether big or small, you must guide those using the data solution and Data architectures you create.

A center of excellence centralizes resources, guidance, and up-to-date assistance for those using data in your organization.  The main outcomes from this are; to maintain consistency in delivering high-quality data solutions and to make sure time is not wasted reinventing the wheel with each project.

ResourceEstablish a Center of Excellence – Power BI – This link provides more detail on a Center of Excellence example from the Power BI side of the business.

The following seven items should be key tenants for your Center of Excellence.

Provide Data Governance Principles

This documents your organization’s overall approach to data: how you collect it, what you should collect it, how you store it, how long you keep it, and who should have access to it. Having these principles front and center also serves as a reminder to the teams.

All Actions and Data Must be Auditable

You need to be able to report and monitor your progress by tracking various metrics, not only the current status of your data but also its over time. For example, you need to be able to audit based on items in your governance program, have access to usage reports on various data sources, see various fields, and share reports.

People Must Have Accountability

Someone must own and be responsible for the data. You must ensure the user knows who to contact for questions or report any issues for all reports or visuals. 

For data consumers to have faith in and trust the data, they need to know who the Subject Matter Experts are and those available to respond to questions. There has to be a culture of responsibility, as once users lose faith in the validity of the data, it can be over.

Data Formula and Calculations

I have been involved in projects where each department had a different way of calculating certain metrics.  This can lead to everyone saying their number is correct or making their own calculations.

For example, Margin% is one of those simple calculations: Divide Gross Profit by Revenue.  But what do you include to calculate Gross Profit?  Some were using Operating Profit, and some had Net Profit.

Each calculation could be correct for how that department looks at their results, but what does a 30% Gross Profit mean on reports?   Having a clearinghouse of data formulas and calculations allows different groups to see what is behind a number and what calculations should be on a specific report.

Data Architectures

A data architecture combines the data flow models, security methods, and various integration patterns that have been tried and tested.  Product evaluations and decisions require an extended process and many different departments.  How a data solution is architected, including approved products that can be used, should come from and be referenced from the COE. 

Data Security & Privacy Policies

You need to protect corporate data and data collected from your customers. You need to define who should have access to what data. Clear security principles need to be front and center.

This is very important to your customer success. With security breaches, if people and organizations do not trust how you handle their data, you lose them as customers.

Governance Review Before Development is Productionalized

Ensure every data project has a data governance document review and sign-off before production. The development team needs to review during the planning phase, but most importantly, they must present their solution for review before release to production. It is much easier to do this before issues arise in production.

An interesting article on the difference between, Productionalize, Productionize, or Productionise

The Center of Excellence’s main outcomes from this are maintaining consistency in delivering high-quality data solutions and ensuring that time is not wasted reinventing the wheel with each project.

Center of Excellence’s main outcomes from this is; to maintain consistency in delivering high-quality data solutions and to make sure time is not wasted reinventing the wheel with each project.

4. What is a Data Inventory?

Taking an inventory of your data can be a complicated process, but it gives you the knowledge you need to build a strong foundation for your company’s future growth. Your inventory can also help you identify areas where new policies or procedures need to be implemented to protect sensitive information or improve the quality of your data. In addition, a well-designed data inventory can also serve as an excellent reference tool for training purposes.

A data inventory or catalog of your organization’s data would, first and foremost, include information on where it’s stored and what’s contained within it. It will also provide you with information on how your business uses that data and who has access to it.

 Things to keep in mind:

  • Who is the Data Owner

  • Who is the Subject Matter Expert (SME)

  • Documentation of formula and calculations

5. What is Data Classification? 

Data classification involves sorting information into categories according to the types of details it contains. This helps you to identify the data types that are most important and need to be protected while identifying information that can be shared more freely. A simple way to classify information is to assign each piece a unique label based on its value or function. However, you could also use many other classification methods, such as using different colors, dimensions, and even numbers to help identify different pieces of information.

6. What is Data Discovery?

Data discovery is a process of searching for and extracting information. Data discovery tools can help you find specific pieces of information by searching for specific keywords, tags, or unique characteristics. Data discovery can be used to find large amounts of data to help you identify trends or patterns. Knowing how to use these tools effectively can help you build an actionable view of your company’s data and result in increased employee productivity and cost savings when looking for solutions that will solve your business problems.

7. What is Data Mapping & Lineage?

Data mapping shows the relationships between different types of data and their dependencies. It looks at the various sources of your data and how that information is used. Mapping takes into account the movement and matching of fields from one database to another.

Data mapping can help you identify any common or repeating patterns, which are often signs that your business should track that type of data more closely or implement new rules about how it’s accessed and stored. It can also show which systems in your company use certain types of data more than others. This can also help you to build a more thorough understanding of your organization’s needs and capabilities. 

Data Lineage includes the process of understanding and showing the full context of your data. Think of it as the visualization of your data’s workflow path and transformations. Mapping the fields from one source to another shows the transformation that occurs as the data moves between parts of your information architecture.

The following screen capture is an example from Azure Purview, Microsoft’s data governance application currently in preview. Through scanning data sources in your organization, the sources can be mapped together to form a lineage. This is a high-level image, but the application allows the objects to be drilled into more detail.

Credit Source: Azure Purview Lineage.

Part 3: Adding in Data Quality 

A Data Quality Framework is a set of policies and procedures to ensure your company’s data is accurate, complete, and current. It can be fairly complex but should provide you with the tools you need to help meet your team’s needs and reduce the risk of losing significant quantities of important information.

Case Study ReferenceThe Development of a Data Quality Framework and Strategy for the New Zealand Ministry of Health (mit.edu)

A standard data quality framework is one in which standards or guidelines have been established for different types of data. These standards typically cover how the information should be stored, the format it should be presented in, and the type of analysis that should be performed on it before it’s made available to users.

The following sections provide an overview of the different areas of data quality.

1. Metadata Management is part of the data quality.  

Metadata is really the information about your data contained in your data estate. This can be descriptors, administrative, reference, or other information you want to provide to people using your data. The main benefit is that it helps you maintain order and can provide solution developers with the approved and validated data for their projects.

A great example is having various reference data available for data consumers to build from. Rather than having each project create a store listing or a date table, a central location that each project can use will reduce the development time and interoperability of various solutions.

A Data Style Guide also helps all team members follow the same metadata and style formatting.

2. What is real-time data quality?

A real-time data quality framework is one in which standards or guidelines for different types of data are determined and followed throughout the day. Different systems could operate on their own schedules, depending on the kind of information they’re monitoring. A real-time framework can be implemented so that your people and systems can keep up with current information trends, ensuring you know about any new issues or changes before they become significant. 

Conclusion

Whether on-premises or in the cloud, more employees are using and sharing an ever-increasing amount of data throughout your organization. Data Governance provides guidance on how to use, maintain, and secure your data. With the increase in data breaches, getting a handle on and securing your data is more important than ever while not putting roadblocks on your analysts who need this data to make decisions and grow your business.

Update: Indigenous Data Governance

The main thrust of this movement is to give Indigenous peoples the right to manage and govern their data. A great resource is provided by THE GLOBAL INDIGENOUS DATA ALLIANCE. This is an important movement to consider in your work.

CARE is an acronym that stands for Collective Benefit, Authority to Control, Responsibility, and Ethics. CARE was created by the International Indigenous Data Sovereignty Interest Group, a group that is a part of the Research Data Alliance. The CARE Principles for Indigenous Data Governance are “people and purpose-oriented, reflecting the crucial role of data in advancing Indigenous innovation and self-determination. Source: CARE Principles for Indigenous Data Governance

References & Further Reading

Steve Young

With over 34 years in the tech industry, including 17 years at Microsoft, I’ve honed my Data Engineering, Power BI, and Enablement skills. My focus? Empowering Technical Education Professionals to excel with adding AI to their content creation workflow.

https://steveyoungcreative.com
Previous
Previous

From Chaos to Clarity: Leveraging AI In The 8 Stages of Data Preparation

Next
Next

Agile Learning for Tech Professionals: Embrace Guided Learning Paths