“When unified analytics has progressed to the point where citizen data scientists are not troubled by the “unknown unknowns” of data analytics, we will know that we have reached a new level of digital transformation that many are striving for,” says Luke Han, Co-Founder and CEO of Kyligence in an exclusive interview with EnterpriseTalk.
ET Bureau: When it comes to data warehouse storage, has cloud made on-premises obsolete?
Luke Han: It may seem that way, but on premises data warehousing is more likely to be transformed into a model more closely resembling the cloud analytics warehouse. There is still a powerful financial argument for running at least some of your workloads on prem. In fact, in a recent Andreesen-Horowitz blog, they note how many companies are “repatriating” cloud workloads because many are finding that the cost of on-prem computing can be as much as half of what it costs to run the same workloads in the cloud.
Of course, data warehousing is a special case. It has long been a premium computing platform and has, therefore, been challenged both by cloud native data warehouses and other open source based solutions in an attempt to extract some of the cost out of ETL and analytics workloads.
But the most likely scenario for the next decade will be the steady rise of hybrid- and multi-cloud strategies. Companies will continue looking to find the perfect balance of cost, control, ease of deployment and a host of other factors that aren’t solved by a pure cloud or pure on-prem strategy.
ET Bureau: What is the impact of the analytics warehouse on today’s analytics operations?
Luke Han: The analytics warehouse is an example of this evolution of the data warehousing concept. It signifies the broadening of the concept of analytics to include new data, new workloads, and a ton more users – what we like to call the citizen data analyst. This speaks to the notion that an increasing proportion of workers are becoming more familiar with, and dependent on, data to get their jobs done. The impact on analytics operations is enormous.
First, we all know that data volumes are growing, but to satisfy hundreds or thousands of citizen data analysts the amount of structured or semi-structured data available for analytics must also greatly increase. That means that data engineering/data preparation will become even more critical than it already is, making these skill sets even more in demand. But the growing requirements of preparing data for thousands of citizen data scientists will make it financially and practically impossible to hire enough data engineers to meet this demand.
Some data engineering tasks will become automated and informed by machine learning and AI – that includes data governing, preparing, modeling and getting insight as well. These skills and processes have been difficult to develop and staff, so it is only natural that they should be simplified and automated (and, therefore, be far less costly).
Second, the data platforms and access and analytics engines themselves must become more and more automated and intelligent. That means these systems must become learning systems and provide self-tuning, self-healing, and evolve as the market, company, and data evolves.
Finally, many of the processes associated with analytics must be greatly simplified and clarified. Citizen data scientists shouldn’t need to know much about “data,” they should use “information” to form their insights. That is, they should be shielded from the complexities and vagaries of data management. The analytics warehouse should provide a clarifying semantic layer that separates the technical aspects of data engineering from the daily use of data as a business asset.
ET Bureau: How can unified analytics bridge the gap between data engineering and data analytics, allowing for concepts like accelerated AI and data-driven decision-making to become a reality for businesses?
Luke Han: Unified analytics speaks to the idea that the application of automation and machine learning to both data engineering and data analytics IT processes will bring the processes closer together and increase their relative agilities.
We have a new generation of information workers that have witnessed firsthand the thousands of little miracles that can happen with the proper application of data analytics. When unified analytics has progressed to the point where citizen data scientists are not troubled by the “unknown unknowns” of data analytics, we will know that we have reached a new level of digital transformation that many are striving for. That isn’t too far off.
ET Bureau: Semi-structured data is just as vital as transactional, structured data in the digital era. Unfortunately, neither the data lake nor the data warehouse is capable of handling both. So, what role does the unified analytics warehouse play here?
Luke Han: Pardon the play on words, but this is a matter of semantics. Enterprises already manage structured, semi-structured, and unstructured data. For cloud analytics and this notion of an analytic warehouse, the overall goal is to bring the processing to the data wherever it lives. The goal is to do this with minimum data movement, to avoid making duplicate copies of data, and to do the least amount of work to prepare data for processing.
The key enabling technology that will make this work is a common data semantic layer that can bridge the gap between analytics users and the increasing wealth of datasets that will continue to be produced. Beyond that, the unified analytics warehouse will have the flexibility to engage the right analytical approach to the data in its native format.
ET Bureau: Is this the right time to incorporate predictive analytics in warehouse management?
Luke Han: This is an inevitable step in the evolution of the analytics warehouse. Traditional ivory tower warehouses evolved at a relatively slow pace at first. This was a bi-product of the relatively high cost and exclusiveness of data warehousing for decision support. The analytical warehouse will be used by many more end users, those citizen data scientists. These new users will have higher expectations and a much greater appetite for innovation and for the disruption of traditional approaches. That is going to force the use of intelligent automation, which will inevitably involve predictive analytics to modernize analytics itself. So, of course, the analytic warehouse will be transformed by the types of analytics that they are created to deliver.
Luke Han is a Co-Founder and CEO of Kyligence, as well as co-founder and Project Management Committee member for Apache Kylin. He was the first ever top-level project VP of the Apache Software Foundation in China. He is a member of the Financial Technology Innovation Alliance and a member of the Digital Finance Working Committee of the Internet Society of China. He is also Microsoft Community Director (RD) and Tencent Cloud Most Valuable Expert (TVP).