CIO: 3 Questions to Ask about your Enterprise Data Lake
Today’s CIOs are no strangers to the concept of the Enterprise Data Lake. Often times, an enterprise data lake is viewed as a panacea for all a CIO’s data ills, including being viewed as the ‘holy grail’ for those trying to spur digital transformation. Yet many CIOs are still struggling to see the payoffs from such data lake investments.
Well known Big Data commentator, Bernard Marr, observed that this is likely the result of disconnected goals and lack of communications between professionals working directly with enterprise data, and those responsible for business performance. In fact, based on a recent study from The Economist, although seventy percent of business executives rated sales and marketing analytics as ‘very’ or ‘extremely important’, only two percent say they have achieved ‘broad, positive impact.’ A similar study by research firm Gartner said: the maturity of businesses and how they use Business Intelligence effectively is still low with only five percent of respondents indicating they have the ability to completely take advantage of advanced analytics.
While both studies reveal the gap between data analysts and decision makers in analytics initiatives, what they overlook is the role data lakes play in amending such disconnections. Ultimately, CIOs need to take a more strategic approach to enterprise data lakes. Here are three questions CIOs should ask themselves in order to reap the full benefits of their data lakes.
Q. Do You Have a Cohesive Strategy for Your Data Lake?
It seems almost instinctive for CIOs to start any data lake initiative by considering to what extent their data lake will be used and build a strategy accordingly. Who will provide data, which departments, which data sets, and who will consume the data? These are important questions to ask when CIOs are determining their data lake strategy.
Choosing a data lake technology, e.g. Hadoop or Amazon S3, is only step one of the journey and does not guarantee success. In order to achieve a true ‘victory’, a CIO’s strategy needs to not only consider the technology used, but more importantly, the people and processes required to make the most effective use of the data lake. Hence, they need to find the right tools (Hadoop or other related Big Data technologies): train their teams with the right skill sets and implement processes to facilitate effective use of the information; and enable access without losing control of data governance or compliance.
Often times, ‘data is simply dumped into the data lake’ without prior planning of how it’s going to be used, and only a handful of highly skilled data scientists who master advanced techniques such as predictive analytics or machine learning can make use of it. If a data lake is built without proper enablement of people or establishment of process, CIOs will end up facing an array of disparate systems and solutions that, in the end, can’t connect or scale, making it impossible to keep up with escalating business needs and data demands and recreating yet another accidental architecture.
Q. Is Your Data Lake At Risk of Becoming Another Silo?
For years, CIOs struggled to break the silos created by the use of point solutions for ERP, CRM, Enterprise Data Warehouse (EDW), Cloud and on-premises applications.
A data lake makes it possible to create a centralized repository for connecting all this data together. However, if the enterprise data lake is not leveraged appropriately, it often ends up being just a data dump or worse still a ‘data swamp’—wherein no one with access can make sense of the information and put it to good use.
First, rules need to be provided on how data collected from different source systems can be connected, often referred to as ‘data ingestion’, and thereafter blended to form a consolidated view of enterprise information. For example, leveraging a data lake to build a 360-degree view of each customer requires integration of not only marketing data across different channels, but also data from finance and CRM systems. Secondly, a metadata model should be put in place to facilitate search, understanding and reuse of data across departments. Without proper metadata management in place, business users will simply be lost in the ‘abyss’ of information. It takes huge efforts to make use of data in the first place, not to mention how to enable employees to collaborate on various data sets so that it can be reused to provide broader-reaching business value.
CIOs need to understand that enterprise data lakes are at the core of enabling critical business initiatives, but only when you leverage the full potential of your data lake to connect, consolidate, share and reuse data, can the broader organization benefit from data lake initiatives.
Q. Is Data Governance the Achilles Heel of Your Data Lake?
Data governance refers to the overall management of availability, usability, integrity, and security of enterprise data. For organizations failing to provide good data governance, data has the potential to become a major liability.
Since the Global Finance Crisis (GFS), financial institutions are under far greater government scrutiny. As a result, the bar has been raised in terms of the IT and data governance measures required to meet these regulations. For example, a US bank recently settled a multi-million dollar penalty with SEC because its failure to enforce policies and procedures to prevent and detect false securities transactions (involving the misuse of material and non-public information). Another example, in healthcare, failure to meet HIPAA compliance, which protects sensitive patient data, can result in a multi-million dollar penalty. Without strong data governance over your data lake, such compliance rules will be impossible to meet.
Data governance is also the foundation for enabling company-wide data citizens —not only IT or data scientists—to access and use reliable data to create actionable insights in a self-service approach. (Click here to read about how to enable self-service data preparation). Only when there’s proper data governance in place can your entire organization fully benefit from the enterprise data lake.
To learn more about how Talend helps unlock the true value of enterprise data lake, watch this short on-demand webinar.