In an earlier blog, I shared some key tips that a company needs to keep in mind while designing data quality (DQ) processes in Hadoop. The data processes and frameworks I mentioned in that blog are important not only because it impacts your data quality program, but also your data governance program – if you have one, of course! So, do you have a data governance program? This question is not easy to answer because data governance as a concept is often not fully understood. Your company is probably already doing some Data Governance but you just don’t realize it, which begs the question: what is data governance (DG)?
The Data Governance Institute defines data governance as “a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.” That’s a loaded definition, but at the end of the day remember that data governance is really is about standards, policies and reusable models.
If you have a Data Warehouse (DW), which is the traditional approach towards getting insights with data, you probably already have some data governance frameworks and standards in place by having standards for your dimensional tables. So when we talk about best practices in DG, the first step is to understand what really DG is for your company.
Data Governance for YOUR Company
Some people assume that data governance is equivalent to master data management (MDM). While there is nothing wrong with that notion, it just is – incomplete. Data Governance doesn’t need to be just one platform or one concept. In fact, a sound data governance approach can and should involve more than one platform or project. Data governance should be THEE program in your company which sets rules and standards for all data related matters. For example, if your business needs a sales reporting solution, governance issues are likely to arise, such as:
Which internal databases have this information?
Who has access to it?
Have we defined what we call a ‘customer’ or a ‘vendor’?
Are the structures of sales data already defined?
What is the quality of the source data?
Are there any metrics around data sizes?
The IT teams are responsible to provide solutions for the project and provide development and infrastructure services but it would the responsibility of the data governance team to provide some guidance to the IT teams about Data Related policies and standards. This brings us to the next key consideration.
The Data Governance Council
A data governance “council” is ideally responsible for setting the data governance framework for the organization. The data governance framework itself should be customized for your company’s specific needs but in general, the framework could include strategic planning tasks such as determining data needs, developing data policies and guidelines, planning data management projects. The framework also could include ongoing control tasks such as managing and resolving data related issues, monitor data policies and promote the value of data assets.
Similar to the IT project leadership teams, Data Governance Council members would need to include members from the business and also IT. It is critical to get business buy-in for this program and they will be actively involved in the DG tasks.
It is also important to have a flexible org structure for the council. A good practice is to follow a top-down approach where the leadership of the council is driving the governance while the business analysts and Data Stewards are implementing the policies. The Data Stewards are responsible to provide the necessary feedback to the leadership.
Implementing data governance involves bringing in a huge change in the organization. That’s why it is imperative for the Data Governance Council to come up with a mission which aligns with the business interests and takes into consideration the strengths of the implementation teams. The mission of the company’s data governance programs would need to be communicated clearly and succinctly articulating the main drivers for governance within the organization.
A data governance program can include a multitude of focus areas and it is important to pick an area which provides the most value to your company. These initiatives can be at the enterprise level, or at the project level. Below are some focus areas and a brief description of each:
- Standards and Policy: This sort of program would collect standards, review and check against the corporate standards. Another activity is defining a data strategy for the company and providing support for any siloed projects trying to join the enterprise landscape.
- Data Quality (DQ): This kind of program deals with finding, correcting and monitoring Data Quality issues in the enterprise. These programs normally involve software for profiling, cleansing and matching engines. A company’s data quality initiatives often lead to Master Data Management (MDM) projects, which define the master data and give a 360-degree view of domains such as customer or vendor.
- Data Security and Privacy: Every company has compliance and regulations requirements and this program would try to address these issues by setting access management rights, information security controls, data privacy procedures etc. particularly for sensitive data.
- Architecture/Integration: This focus area aims to achieve operational efficiency by simplifying data integration architecture components such as Data Modelling, Master Data modeling, Service Oriented Architecture etc.
- DW and Business Intelligence (BI): This program promotes the use of building Data Warehouses and Data Marts to support historical reporting and also futuristic reporting.
- Self-service architectures: This kind of program takes into consideration the stewardship and Data Preparation challenges and aims to build workflows limiting the ‘shadow IT’ paradigm, which happens so often in organizations.
It’s A Journey, Not a Destination
It is important to understand that, just like corporate governance, data governance is not a project but an ongoing process. Any ongoing process will need to have goals defined and a method to measure the progress of the program. A recommended approach for this would be to scan the progress against a Data Governance Maturity Model. Depending on the focus area you chose for the DG program, your company should also have metrics defined to measure the success of the program. It is also recommended to use agile practices. Agile methods such as continuous delivery, constant collaboration between IT and business, welcoming change and having continuous attention to technical excellence and good design fit perfectly into data governance practices.
Just like processes and people, technology is a big part of the data governance and it’s ever-changing. Whether you are in business or IT, it is recommended to embrace technological innovation. New innovations in machine learning, cloud, and big data can make data governance initiatives effective. For example, building a data lake on Hadoop could make storage of master data and data warehouse data cheaper, and processing it faster.
There is a very good chance your company already has one of the data governance initiatives successfully implemented. My recommendation is to use that as a base for implementing other focus areas. Have a vision for data governance in your company, get buy-in from business and IT leadership and make IT and business collaboration better. With these initial steps in place, data governance can successfully evolve and provide true value to the organization!