By ET Bureau - June 29, 2022 6 Mins Read
Databricks, the data and AI company and pioneer of the data lakehouse paradigm, today unveiled the evolution of the Databricks Lakehouse Platform to a sold-out crowd at the annual Data + AI Summit in San Francisco. New capabilities revealed include best-in-class data warehousing performance and functionality, expanded data governance, new data sharing innovations to include an analytics marketplace and data clean rooms for secure data collaboration, automatic cost optimization for ETL operations, and machine learning (ML) lifecycle improvements.
“Our customers want to be able to do business intelligence, AI, and machine learning on one platform, where their data already resides. This requires best-in-class data warehousing capabilities that can run directly on their data lake. Benchmarking ourselves against the highest standards, we have proven time and again that the Databricks Lakehouse Platform gives data teams the best of both worlds on a simple, open, and multi-cloud platform,” said Ali Ghodsi, Co-founder and CEO of Databricks. “Today’s announcements are a significant step forward in advancing our Lakehouse vision, as we are making it faster and easier than ever to maximize the value of data, both within and across companies.”
The Best Data Warehouse is the Lakehouse
Organizations like Amgen, AT&T, Northwestern Mutual and Walgreens, are making the move to the lakehouse because of its ability to deliver analytics on both structured and unstructured data. Today, Databricks unveiled new data warehousing capabilities in its platform to further enhance analytics workloads:
Data Governance Highlighted as a Top Priority with Advanced Capability for Unity Catalog
Unity Catalog, generally available on AWS and Azure in the coming weeks, offers a centralized governance solution for all data and AI assets, with built-in search and discovery, automated lineage for all workloads, with performance and scalability for a lakehouse on any cloud. Also, Databricks introduced data lineage for Unity Catalog earlier this month, significantly expanding data governance capabilities on the lakehouse and giving businesses a complete view of the entire data lifecycle. With data lineage, customers gain visibility into where data in their lakehouse came from, who created it and when, how it has been modified over time, how it’s being used across data warehousing and data science workloads, and much more.
Enhanced Data Sharing Enabled By Databricks Marketplace and Cleanrooms
As the first marketplace for all data and AI, available in the coming months, Databricks Marketplace provides an open marketplace to package and distribute data and analytics assets. Going beyond marketplaces that simply offer datasets, Databricks Marketplace enables data providers to securely package and monetize a host of assets such as data tables, files, machine learning models, notebooks and analytics dashboards. Data consumers can easily discover new data and AI assets, jumpstart their analysis and gain insights and value from data faster. For example, instead of acquiring access to a dataset and investing their own time to develop and maintain dashboards to report on it, they can choose to simply subscribe to pre-existing dashboards that already provide the necessary analytics. Databricks Marketplace is powered by Delta Sharing, allowing data providers to share their data without having to move or replicate the data from their cloud storage. This allows providers to deliver data to other clouds, tools, and platforms from a single source.
Databricks is also helping customers share and collaborate with data across organizational boundaries. Cleanrooms, available in the coming months, will provide a way to share and join data across organizations with a secure, hosted environment and no data replication required. In the context of media and advertising, for example, two companies may want to understand audience overlap and campaign reach. Existing clean room solutions have limitations, as they are commonly restricted to SQL tools and run the risk of data duplication across multiple platforms. With Cleanrooms, organizations can easily collaborate with customers and partners on any cloud and provide them the flexibility to run complex computations and workloads using both SQL and data science-based tools – including Python, R, and Scala – with consistent data privacy controls.
MLflow 2.0 Streamlines and Accelerates Production Machine Learning at Scale
Databricks continues to lead the way in MLOps innovation with the introduction of MLflow 2.0. Getting a machine learning pipeline into production requires setting up infrastructure, not just writing code. This can be difficult for new users and tedious for everyone at scale. MLflow Pipelines, made possible by MLflow 2.0, now handles the operational details for users. Instead of setting up orchestration of notebooks, users can simply define the elements of the pipeline in a configuration file and MLflow Pipelines manages execution automatically. Looking beyond MLflow, Databricks also added Serverless Model Endpoints to directly support production model hosting, as well as built-in Model Monitoring dashboards to help teams analyze the real-world model performance.
Delta Live Tables Includes Industry First Performance Optimizer for Data Engineering Pipelines
Delta Live Tables (DLT) is the first ETL framework to use a simple, declarative approach to building reliable data pipelines. Since its launch earlier this year, Databricks continues to expand DLT with new capabilities including the introduction of a new performance optimization layer designed to speed up execution and reduce costs of ETL. Additionally, new Enhanced Autoscaling is purpose-built to intelligently scale resources with the fluctuations of streaming workloads, and Change Data Capture (CDC) for Slowly Changing Dimensions – Type 2, easily tracks every change in source data for both compliance and machine learning experimentation purposes.
The Enterprise talk Bureau has five well-trained writers and journalists, well versed in B2B enterprise technology industry, and constantly in touch with industry leaders for the latest trends, opinions, and other inputs- to bring you the best and latest in the domain.
A Peer Knowledge Resource – By the CXO, For the CXO.
Expert inputs on challenges, triumphs and innovative solutions from corporate Movers and Shakers in global Leadership space to add value to business decision making.Media@EnterpriseTalk.com