Kesu - Fotolia
Data management processes take on a new tenor in analyst's view
Changes in data management processes -- including self-service data preparation, data lakes and real-time analytics -- create a new landscape in companies, according to analyst Matt Aslett.
As costs for data storage go down and new tools proliferate, data management processes are undergoing significant changes. Self-service data preparation is creeping into view, and self-service analytics are becoming more prevalent.
As shown in a "Total Data Management" report published by 451 Research in February 2017, the effects are startling. Report co-author Matt Aslett, a research director at 451, recently provided a view on the trends that he sees rocking data management in an interview with SearchDataManagement.
Changes in data management processes seem to be coming from all directions. Can you help us set a baseline to see where we are coming from?
Matt Aslett: Historically, the role of data management was about getting data from enterprise applications into a data warehouse. Organizations spent a long time trying to achieve the magical, single data warehouse. They failed along the way.
They ended up with numerous data marts and data warehouses. Things were very much IT-driven. IT had the purse strings and controlled access to those data warehousing environments in terms of configuring them and doing all the modeling upfront.
Then we saw a lot of cases where lines of business wanted to be more agile, more responsive to change and to deliver more value for the larger business. In terms of analytics projects, people were hamstrung by their reliance on traditional methodologies. It would take a long time to get something configured.
Let's fast forward. Now the password seems to be self-service on both ends of data management.
Aslett: Yes. You see the emergence of self-service data preparation with software by Trifacta, Paxata and many others. And you see the move toward self-service at the analytics layer. That is driven by software from Qlik, Tableau and others. You have changing data platforms on the one side, and changing analytics platforms on the other. You see shifting to self-service on both sides. And you see data integration being pulled at both ends.
Of course, you have other things going on, as well -- especially in terms of a move to real-time applications. And, on top of that, there is an increased role for developers in terms of actually defining data sources.
A focus on weekly and monthly BI reports seems to be giving way to focus on building more frequent, almost real-time analyses. How does that pan out in companies?
Aslett: People are finding themselves with more and more requirements for continuous data integration. That's really the key for me in terms of what we see that is different. Although, real time means different things to different people, and it could be mixed with historical data that may be batch processed. The key thing is that the historical batch ETL [extract, transform and load] process is something companies cannot solely rely on any longer.
Let's go back to the notion of self-service, particularly with data preparation. It always seems that some assembly is required in data management processes. How real is the idea of full self-service for data preparation?
Aslett: It depends on the user or the use case more than on the technology. That is an important factor in the way things are evolving. There may be a set of users within an enterprise that the organization wants to have true self-service access to data. But there are going to be things that put limits on that, such as security or privacy. What we see evolving is a managed self-service approach where organizations are trying to get balance, so as to provide the right access to the data to the right people for the right purpose, while allowing them to collaborate.
In many cases, what we're seeing are small pockets of self-service data preparation within an analytics team. In some cases, it's very much limited to the data science group. Some vendors are trying to spread that to general business and data analysts. They're doing that through software that balances the benefits of self-service for agility with the needs for data governance.
Where are we with the data lake now? When it first emerged as an alternative to the data warehouse, you noted that the term data lake was not the greatest analogy.
Aslett: It is gaining more clarity, but it is still somewhat confusing. Whether you think it's a good analogy or not, one of the problems with it as a term is that it doesn't define the technology, it defines a concept.
Matt Aslett451 Research
Conceptually, people get it. There is a growing understanding of what the value is compared to or as a supplement to the existing data warehouse. The data warehouse is designed for a specific purpose, and the data lake is designed to enable some agility and serve multiple purposes.
The challenge sometimes is that it doesn't define a specific technology -- there are multiple ways of doing it. Does a data lake need only to be on Hadoop? Could it be Hadoop plus storage? Is it cloud, as well? So there is still plenty of space for confusion, or at least interesting conversation.
Is a data lake plan one of the things that helps define data management strategy?
Aslett: We see this evolving. Particularly, we find more people that either have the title of chief data officer or have that role. That is, somebody whose job is to actually define the strategy for data management and data use within the organization. What we see are companies with someone in that role moving faster to take a strategic view of the data lake and the way in which data is accessed.
There are others out there that have a more tactical view, where multiple analytics projects have multiple, smaller data lakes ongoing, and there is nothing wrong with that. But, where there is a chief data officer or a similar role, the job is to connect the dots.
That means to make strategic decisions around whether the organization is going to have, for example, five Hadoop deployments, versus one or two data lakes, and perhaps a data warehouse. It may mean deciding whether to have one supplier that covers the stack from top to bottom within a data lake, or to go with best-of-breed software where needed.
Our research showed, if people are taking a strategic view of data management, those decisions have to be taken by a person or group of persons with senior responsibility rather than someone that is only responsible for the individual project within a broad company.