peshkova - Fotolia

Information architecture applied to big data streaming, AI

New technologies challenge data professionals, but taking a step back helps with hurdles. In this interview, consultant William McKnight takes a measured look at data streaming, GDPR and AI.

Data management expert William McKnight has long espoused a well-designed information architecture as an effective means of bringing order to a continuous procession of new data technologies.

In this Q&A, the second of two parts from an interview with the president of McKnight Consulting Group, he discusses what organizations need to do to adapt as AI and big data streaming rise in importance. McKnight also assesses how the European Union's new General Data Protection Regulation (GDPR) will affect data management processes in affected organizations.

Big data streaming is among the technology trends that now seem to be calling us to revisit essential information architecture principles. What makes that such an active area of interest now?

William McKnight: Streaming, in some cases, is perhaps the only way to deal with the velocity of data today. And it's a fundamentally different approach. I think it is going to be on par with ETL [extract, transform and load] in the not-too-distant future in terms of how we move data around because it can also be applied to most day-to-day data integration tasks.

It might be overkill -- it might not require that you go around and replace ETL methods with streaming across the organization or anything like that. But I think newer workloads that are dealing with big data really need a streaming type of approach.

Let's look beyond big data streaming. How that data is handled is of increasing concern, and there is new regulation on the horizon. How do you view the arrival of GDPR and Facebook's recent related adversity?

Picture of data management consultant William McKnightWilliam McKnight

McKnight: GDPR signals a shift in data privacy control toward more regulation and transparency. It also points to a culture of privacy notices that now need to be in place. It hopefully signals a move toward the fundamental rights of people.

Among the companies I meet [with], many of them are betting a little bit that they may not be the first audit in this program. But it is getting to the nitty-gritty time here. Strong data governance will take you a long way there; not all the way, but a long way toward satisfying your GDPR requirements.

Facebook's travails are a bit of a wake-up call. I think people are realizing that they have assets they can't touch or feel. We're used to thinking of our home or our car as an asset. But now, it's not just the dollars we have in the bank that are largely electronic, but it's also the data about us.

It sure didn't take long before the internet got abused in such a way that data starts to leak out all over the place. Hopefully, we can turn this thing around and get past the massive breach, as I call it, that we had with Facebook.

Artificial intelligence, on one level, is robots and self-driving cars. It's also about data. What issues will data architects handling AI confront in the days to come?

Data architecture is the foundation for artificial intelligence. If you want to get there, you need to have great data.
William McKnightpresident, McKnight Consulting Group

McKnight: It's a difficult question. That's because the data for testing and training AI can come from all over the place. It can come from e-commerce; it can come from ERP, customer account data, call center recordings and CRM. It can come from IoT [the internet of things], streaming sensor data, or publicly available or third-party information. All of this can go into producing good AI algorithms.

So, for example, if you are looking at doing something like predictive maintenance, you need some good structured data -- time series and event data, graph data -- and unstructured data, such as text, image and sound data, as well. Flat files seldom, but sometimes, still play into this; it's more databases, HDFS [Hadoop Distributed File System], in-memory data for high-performance machine learning, and even text-based serializations, such as JSON and CSV for format interoperability. Today, data architecture is the foundation for artificial intelligence. If you want to get there, you need to have great data.

Dig Deeper on Data integration