Wednesday, December 7, 2022

How The Rise of Data Lakehouse Is a New Era of Data Value

By Swapnil Mishra - August 31, 2022 4 Mins Read

Enterprises can accelerate analysis and maximize data value at a lower cost by using query accelerators like Data Lakehouses.

Data Lakehouses combine the data warehouse and data lake into a single system of insight.

To enable Business Intelligence (BI) and Machine Learning (ML) on all data, a new, open data management architecture called a “data lakehouse” had been developed.

It combines the adaptability, cost-effectiveness, and scale of data lakes with data warehouses’ data management and ACID transactions.

As part of their data strategies, organizations have traditionally maintained two systems: a system of records for managing their operations, and a system of insight, like a data warehouse, for gathering business intelligence (BI). As big data became more prevalent, the dual system of insight, the data lake, emerged to provide insights from artificial intelligence and machine learning (AI/ML). However, many organizations are finding that this paradigm of relying on two different systems of insight is unworkable.

Moving data from the system of record to the data warehouse necessitates a time-consuming extract, transform, and load (ETL) process, after which the data would be normalized, queried, and answers would be obtained. Unstructured data would be deposited into a data lake and then subjected to tool-based analysis by skilled data scientists.

A growing number of businesses are discovering that lakehouses, which belong to the product category of query accelerators, are fulfilling a crucial need. A data lake typically receives structured data from a data warehouse. The lakehouse adds additional layers of optimization to make the data more accessible for gaining insights.

The brand-new query acceleration platforms are evolving. Data clouds and lakehouses with features created for the demands of businesses in particular industries, like retail and healthcare, have been introduced by Databricks and Snowflake.

Also Read: How Enterprises Can Leverage Data Lakehouse Architecture to Get the Most Value from Their Data

Although data warehouses have a long history in decision support and business intelligence applications, handling unstructured data, semi-structured data, and data with wide variety, velocity, and volume was neither feasible nor expensive.

The emergence of Data Lakes

In order to handle raw data in various formats on cheap storage for data science and machine learning, data lakes then emerged. However, they lack key characteristics from the world of data warehouses: they do not support transactions, they do not enforce data quality, and their lack of consistency/isolation makes it nearly impossible to mix appends and reads, batch and streaming jobs.

Data Lakehouse: Simplicity, Flexibility, and Low Cost

A new, open system design that implements comparable data structures and data management capabilities to those in a data warehouse directly on the kind of inexpensive storage used for data lakes makes it possible to create data lakehouses. When they are combined into one system, data teams can work more quickly because they can use data without having to access multiple systems. Additionally, data lakehouses guarantee that teams working on data science, machine learning, and business analytics projects have access to the most complete and current data available.

Common Two-Tier Data Architecture

In order to enable BI and ML across the data in both systems, data teams consequently connect these systems, creating duplicate data, adding to infrastructure costs, posing security risks, and increasing operational costs. Data is ETLd from the operational databases into a data lake in two-tier data architecture. This lake houses all of the company’s data in low-cost object storage and is formatted to work with popular machine learning tools, but it is frequently disorganized and neglected. In order to load a small portion of the crucial business data into the data warehouse for business intelligence and data analytics, ETL is applied to this portion of the data. This two-tier architecture necessitates routine maintenance and frequently yields stale data because of the numerous ETL steps.

Check Out The New Enterprisetalk Podcast. For more such updates follow us on Google News Enterprisetalk News.



AUTHOR

Swapnil Mishra

Swapnil Mishra is a Business News Reporter with OnDot Media. She is a journalism graduate with 5+ years of experience in journalism and mass communication. Previously Swapnil has worked with media outlets like NewsX, MSN, and News24.

Subscribe To Newsletter

*By clicking on the Submit button, you are agreeing with the Privacy Policy with Enterprise Talks.*