alphaspirit - Fotolia

The benefits and challenges of augmented data discovery tools

Augmented data discovery tools enable users to gain faster insights into data, via automated data prep and pattern discovery, but they aren't without their challenges.

Augmented data discovery is an emerging BI capability for automatically preparing and organizing enterprise data for self-service BI. This is particularly challenging for unstructured data from sources like email, social media channels, IoT feeds and customer service interactions.

Traditional BI tools have supported basic capabilities for joining, manipulating and transforming structured data. Augmented data discovery can build on these basic capabilities with augmented data preparation and automated pattern discovery for self-service BI, according to research firm Gartner Inc. Augmented data preparation streamlines processes for data profiling, managing quality, cleaning data, modeling, enriching and labeling metadata in a manner that supports reuse and governance. Automated pattern detection builds on traditional BI tools to support complex, large data sets with more than 10 columns.

Augmented data discovery focuses on providing insight for citizen data scientists. In Gartner's view, these are similar but somewhat different to augmented data science platforms used for building data inference models that can be embedded into apps. Consequently, augmented analytics tools also tend to include natural language query and natural language generation features. This ease of access promises many benefits, but enterprises also face several challenges in making the tools work well in practice.

Benefits: Access and understand new data, faster

Augmented data discovery reduces the time and complexity of deriving valuable insights from new data sets, especially unstructured ones. Cognitive services are often used by these tools to scale more efficiently than manual processes. They can process up-to-the-millisecond data on the fly to instantly derive data, said Stephen Blum, founder and CTO of PubNub, a data management API provider.

Other benefits of augmented data discovery tools include the following:

Gain insight on live conditions: "The ability to see and act on real-time conditions has only been available via very expensive, noninteractive dashboards that provide little value," said Mark Palmer, general manager of analytics at Tibco Software. Now, any user can utilize BI tools to visualize, understand and act on live IoT data, live geographic data or a live view of business transactions in just minutes. This makes real-time commerce and customer engagement possible.

By applying AI and augmented data discovery, we begin to make algorithmic sense of unorganized data swamps.
Mark PalmerTibco Software

Act on insights faster: Traditional BI tools were good at using data to course correct the business with minor adjustments. Augmented data discovery promises to make it easier to discover new insights that could guide more radical and impactful changes. Micha Breakstone, co-founder and head of R&D at Chorus.ai, a conversational analytics service, has been experimenting with methods such as anomaly detection and covert pattern recognition to discover deeper insights and, in some cases, proactively. "Additionally, actionability can be modeled across various predefined business dimensions to ensure business value of the insights," he said.

Turn data swamps into data lakes: Companies have used cloud storage and Hadoop technologies to store data sets in case they may be useful one day. But without a clear goal, data management architecture or governance strategy, it's easy for these data sets to grow out of control. "It's become a data swamp, not a data lake," Palmer said. "By applying AI and augmented data discovery, we begin to make algorithmic sense of unorganized data swamps."

Reduce technical hurdles: Augmented analytics reduces the burdens around data profiling and data preparation for preparing reports. "Using augmented data discovery, more business users are able to discover data and gain insights from the data, even if they do not know how one data element is related to another data element," said Gal Ziton, CTO and co-founder of Octopai, a metadata management platform. For example, augmented data discovery could automatically join multiple tables required to generate a report. If the customer identification table is named CUST ID in one system and C ID in another system, then augmented data discovery can help join these tables.

Challenges: Costly services need to earn trust

Augmented data discovery tools aren't without their obstacles, however.

"Processing data at scale can easily run up massive costs," PubNub's Blum said. Cognitive services, like IBM Watson and Amazon Comprehend, are incredibly powerful and easy to integrate through their APIs, but charging by each execution of the service can easily run up a giant monthly bill. It's important to not run the computation repeatedly on every piece of data, but only on the databases of unstructured data to be analyzed, he said.

Other challenges organizations may encounter with augmented data discovery include:

Building trust: Managers implementing augmented data discovery need to think about building trust in the resulting insights and trust that employees won't lose their jobs. John Hagerty, vice president of product management for business analytics at Oracle, said: "It's critical that organizations be prepared to work with business teams to generate that trust by proving out the choices made via algorithms, so they embrace -- not doubt -- the recommendations the system makes."

Organizations must address concerns that current jobs will change once augmented analytics projects are off the ground. Otherwise, employees may find creative ways to slow adoption of the tools, or worse. "Systems automatically doing the work that humans now do is frightening to some people," Hagerty said. Managers should find ways that these tools assist people in their decision-making, rather than replace them. It's important to focus on how these tools can provide employees with more time for taking the best actions to optimize performance.

Eliminating hidden biases: Augmented data discovery can introduce risks of weaving biases into the data models generated by citizen data scientists. Enterprise data can include collections of data tables with complex relationships. Accurately transforming these into a single table for generating insight often requires highly manual and time-consuming efforts from domain experts and data scientists, said Ryohei Fujimaki, CEO and founder of dotData, a data science automation platform.

This expertise is often required to narrow down hypotheses that are based on domain expertise and informed bias honed by years of experience. Some augmented data discovery tools are starting to use AI to help simplify the expertise required to create feature tables correlated with business outcomes. These can be enhanced with capabilities that generate natural language explanations and visual blueprints for data model features that are more understandable to business users.

Enabling the wrong users: The huge promise of abstracting insights away from the underlying technical aspects in order to enable users to elicit deep, actionable insights is also a danger. "In the wrong hands, such high-power tools can be exploited either maliciously or through negligence to support pseudo-insights and harmful recommendations," Chorus.ai's Breakstone said. This could be particularly concerning for high-stakes insights, like analyzing candidate data to make recommendations of who should be accepted to med school or systems analyzing patient data to recommend who receives the next available organ donation. "What happens when anyone can interact with such systems to derive the 'business insights' they are looking for?" Breakstone explained.

Setting a data governance plan: It's important to include a governance strategy around the data used in augmented data discovery. "If you create insights from unreliable data, you can't trust the results," said Marge Breya, chief marketing officer at MicroStrategy. Taking humans out of the data preparation and modeling processes can also be a risk to effective decision-making. This might result in fostering biases because of the person creating the model.

Breya said she recommended managers incorporate a consistent semantic layer for tying together disparate data sources. Furthermore, the data should be augmented with additional telemetry, such as user and system information. "If done well, augmented data discovery has the potential to create direct pathways to the insights every employee needs to make real-time, data-driven decisions that boost productivity," Breya said.

Dig Deeper on Data science and analytics