ra2 studio - Fotolia
Yellowbrick CEO outlines hybrid path to cloud data warehouse
Neil Carson, co-founder and CEO of Yellowbrick, details the state of data warehouse market and outlines use cases for hybrid on-premises and cloud deployments.
The data warehouse market has evolved in recent years as the cloud has become more prominent.
Data warehouses are foundational IT assets for many organizations, serving as repositories for data analysis and business intelligence. Organizations now have a variety of choices when it comes to data warehousing, including cloud, on-premises and hybrid models. Data warehouses also overlap with data lake offerings, in which organizations store unstructured data before drawing it into a more structured data warehouse.
Among the key vendors in the competitive data warehouse market is Yellowbrick Data, which is based in Palo Alto, Calif., and offers both on-premises and cloud data warehouse platforms as well as supporting hybrid deployment across both modalities.
In this Q&A, Neil Carson, the vendor's co-founder and CEO, discusses the effects of the COVID-19 pandemic on the tech sector, how the data warehouse business has changed since the company started in 2014 and why a cloud data warehouse alone is often not enough for many organizations.
What impact have you seen from the COVID-19 pandemic?
Neil Carson: What's happened with COVID-19 is that it has forced a reality check across people and across industries. It has forced a lot of people to think through what really matters at the end of the day, and what is the important stuff that we should be or what we should be focusing on. More people are doing stuff online, more people are consuming more technology. As more people do that, it brings new opportunities for analytics in those lines of business. I think, you know, if you're selling to airlines, now, you're probably going to be having a bit of a tough time. But the majority of businesses are actually going to come with through this fine at the end of the day.
What has changed in the data warehouse market since you started the company in 2014?
Carson: I remember back when I was starting Yellowbrick, we were looking at how Hadoop was going to be the next big thing and it was going to change data warehousing and be a multi-billion-dollar industry. And we saw that that didn't really play out. I think that entire industry has ended up tiny in terms of revenue.
Neil CarsonCo-founder and CEO, Yellowbrick Data
One of the big trends that has worked really well is obviously the cloud. We've seen people going from the view that the cloud is going to change everything and they're going to move everything into the cloud, to really sort of a much more pragmatic approach. We see now that large enterprises are doing hybrid cloud and private cloud. Not everything is going to go into the public cloud.
How have you seen the shift in demand from on premises to cloud for data warehouse?
Carson: When we started the company, cloud was something people were experimenting with. Snowflake hadn't come out back then and Amazon Redshift was a fledgling offering. Whereas we've now seen there's an incredible amount of competition in cloud data warehouses.
Google is putting tons of effort into BigQuery. Microsoft has an incredible product with Azure Data Warehouse, Snowflake is competing with Amazon on Amazon, while Amazon continues to invest heavily on Redshift as well. So there's a massive amount of competition there now.
When you look in the big industries like financials and telecommunications, which are some of our large verticals, many of our customers still build data centers. Most of the largest banks still building tailored data centers and telcos [telecommunication companies] are in the data center business. They've got new edge data centers being built for 5G now, which is really interesting and enables all sorts of new use cases. So what we see among certain verticals and large enterprises is that the future will be hybrid.
We started in the on-premises market, simply because it's the biggest market. Our business has grown and a lot of that has been driven by the on-premises business. But we now have customers doing hybrid cloud applications, as well as some pure cloud-only customers.
What are some of the use cases you've seen for hybrid data warehouses, with both on-premises and cloud components?
Carson: Availability is a key part of hybrid deployments, because organizations don't want a second system that's just idle, sitting around on premises in a data center. So they keep that in the cloud and spin it up and use it on demand.
There are other advantages as well. For example, most large enterprises are saddled with tons of legacy ETL [extract, transform, load]. ETL is the technology that moves data from mainframes and old databases and moves that into data warehouses for analytics. In moving to the cloud, they realize they need to figure out what they are going to do with all of the ETL that they are using. If you want to migrate your whole workload and all of that ETL from your other databases to the cloud, you've got to go and spend a whole ton of money to re-engineer processes.
What we can do now is we can put an instance on premises to consume that legacy ETL, replicate the results of the ETL to the cloud and let the users consume the data in the cloud. So that's another really interesting hybrid use case as well. It lets customers move to the cloud quicker by keeping the legacy ETL but moves the consumption to the cloud.
What is the intersection of cloud data warehouse and data lakes at this point?
Carson: Most large enterprises have built a data lake. What's happened with the data lakes is people really haven't got the business value from them because they were tools designed for programmers, not business users.
What we're finding is the trend now is the business consumption of data that was in data lakes is now moving back to data warehouses again. Previously, that wasn't a fashionable thing.
I think when you go to a cloud person today, what they mean by data lake is just that they haven't put the data into a schema yet and they're just dumping it there, in the data lake. But ultimately, to get value out of the data, it has to be structured with a schema.
Editor's note: This interview has been edited for clarity and conciseness.