E-Handbook: Enterprise data lakes hold the key to actionable insights Article 1 of 4

Big data's vast melting pot for business intelligence

Enterprise data lakes are the one repository designed to handle the infinite amounts of structured, semistructured, unstructured, streaming and batch data collected from countless tributaries. They have, by default, become the vast go-to reservoirs for incoming disparate data and outgoing enriched data that allow data managers to govern systems, analysts to derive insights, data scientists to predict outcomes and executives to devise sound business strategies.

Data lake architectures can best justify their existence when they go well beyond simple storage to cost-effectively contribute intelligence to a company's operations and strategic planning through sophisticated search, modeling and analytics techniques.

"A data lake puts enterprise-wide information into the hands of many more employees to make the organization as a whole smarter, more agile and more innovative," wrote Carlos Maroto, functional and industry analytics manager at Accenture, in an April 2020 blog post. "Users from different departments potentially scattered around the globe can have flexible access to the data lake and its content from anywhere. This increases reuse of the content and helps the organization to more easily collect the data required to drive business decisions."

The road to democratization, reuse, exploration and analysis of data must first be paved by data ingestion, extraction, cleaning and integration; data set discovery and versioning; and metadata administration. But transforming data of all shapes and sizes into workable BI can be a costly fool's errand. Without proper implementation and management to ensure their integrity, data lakes can spawn fishy outcomes.

This handbook examines the key considerations, tools and techniques when implementing and managing enterprise data lakes, including platforms, performance, governance, scalability, accessibility, security and cost. We also explain how data fabrics can enhance the multidimensional attributes of data lake architectures to better align them with business goals.