Grafvision - Fotolia

Natural language query tools offer answers within limits

Natural language query tools could one day allow end users to do all their data analysis without the aid of data scientists, but at this point the capabilities remain limited.

Natural language query has the potential to put analytics in the hands of ordinary business users with no training in data science, but the technology still has a ways to go before it develops into a truly transformational tool.

Natural language query (NLQ) is the capacity to query data by simply asking a question in ordinary language rather than code, either spoken or typed. Ideally, natural language query will empower a business user to do deep analytics without having to code.

That ideal, however, doesn't exist.

In its current form, natural language query allows someone working at a Ford dealership to ask, "How many blue Mustangs were sold in 2019?" and follow up with, "How many red Mustangs were sold in 2019?" to compare the two.

It allows someone in a clothing store to ask, "What's the November forecast for sales of winter coats?"

It is not, however, advanced enough to pull together unstructured data sitting in a warehouse, and it's not advanced enough to do complicated queries and analysis.

Natural language query has the potential to democratize data throughout an organization.
Natural language query enables business users to explore data without having to know code.

"We've had voice search, albeit in a limited capacity, for years now," said Mike Leone, senior analyst at Enterprise Strategy Group (ESG), an IT analysis and research firm in Milford, Mass. "We're just hitting the point where natural language processing can be effectively used to query data, but we're not even close to utilizing natural language query for complex querying that traditionally would require extensive back-end work and data science team involvement."

Similarly, Tony Baer, founder and CEO of the database and analytics advisory firm DBInsight, said that natural language query is not at the point where it allows for deep analysis without the involvement of data scientists.

"You can't go into a given tool or database and ask any random question," he said. "It still has to be linked to some structure. We're not at the point where it's like talking to a human and the brain can process it. Where we are is that, given guardrails, given some structure to the data and syntax, it's an alternative to structure a query in a specific way."

NLQ benefits

At its most basic level, business intelligence improves the decision-making process. And the more people within an organization able to do data-driven analysis, the more informed the decision-making process, not merely at the top of an organization but throughout its workforce.

We're just hitting the point where natural language processing can be effectively used to query data, but we're not even close to utilizing natural language query for complex querying.
Mike LeoneSenior analyst, Enterprise Strategy Group

Meanwhile, natural language query doesn't require significant expertise. It doesn't force a user to write copious amount of code to come up with an answer to what might be a relatively simple analytical question. It frees business users from having to request the help of data science teams -- at least for basic queries. It opens analytics to more users within an organization.

"Good NLQ will help BI power users and untrained business users alike get to insights more quickly, but it's the business users who need the most help and have the most to gain," said Doug Henschen, principal analyst at Constellation Research. "These users don't know how to code SQL and many aren't even unfamiliar with query constructs such as 'show me X' and 'by Y time period' and when to ask for pie charts versus bar charts versus line charts."

"Think of all the people who want to run a report but aren't able to do so," echoed Jen Underwood, founder and principal consultant at Impact Analytix, an IT consulting firm in Tampa, Fla. "There's some true beauty to the search. How many more people would be able to use it because they couldn't do SQL? It's simple, and it opens up the ability to do more things."

In essence, natural language query and other low-code/no-code tools help improve data literacy, and increasing data literacy is a significant push for many organizations.

That said, in its current form it has limits.

"Extending that type of functionality to the business will enable a new demographic of folks to interact with data in a way that is comfortable to them," Leone said. "But don't expect a data revolution just because someone can use Alexa to see how many people bought socks on a Tuesday."

The limitations

Perhaps the biggest hindrance to full-fledged natural language query is the nature of language itself.

Without even delving into the fact that there are more than 5,000 languages worldwide and an estimated 200 to 400 alphabets, individual languages are complicated. There are words that are spelled the same but have different meanings, others that are spelled differently but sound the same, and words that bear no visual or auditory relation to each other but are synonyms.

And within the business world, there are often terms that might mean one thing to one organization and be used differently by another.

Natural language query tools don't actually understand the spoken or written word. They understand specific code and are programmed to translate a spoken or written query to SQL, and then translate the response from SQL back into the spoken or written word.

"Natural language query has trouble with things like synonyms, and domain-specific terminology -- the context is missing," Underwood said. "You still need humans for synonyms and the terminology a company might have because different companies have different meanings for different words."

When natural language queries are spoken, accents can cause problems. And whether spoken or written, the slightest misinterpretation by the tool can result in either a useless response or, much worse, something incorrect.

"Accuracy is king when it comes to querying," ESG's Leone said. "All it takes is a minor misinterpretation of a voice request to yield an incorrect result."

Over the next few years, he said, people will come to rely on natural language query to quickly ask a basic question on their devices, but not much more.

"Don't expect NLQ to replace data science teams," Leone said. "If anything, NLQ will serve as a way to quickly return a result that could then be used as a launching pad for more complex queries and expert analysis."

While held back now by the limitations of language, that won't always be the case. The tools will get more sophisticated, and aided by machine learning, will come to understand a user's patterns to better comprehend just what they're asking.

"Most of what's standing in the way is a lack of experience," DBInsight's Baer said. "It's still early on. Natural language query today is far advanced from where it was two years ago, but there's still a lot of improvement to be made. I think that improvement will be incremental; machine learning will help."

Top NLQ tools

Though limited in capability, natural language query tools do save business users significant time when asking basic questions of structured data. And some vendors' natural language query tools are better than others.

Though one of the top BI vendors following its acquisition of Hyperion in 2007, Oracle lost momentum when data visualizations changed the consumption of analytics. Now that Augmented intelligence and machine learning are central tenets of BI, however, Oracle is again pushing the technological capabilities of BI platforms. Oracle Analytics Cloud and Day by Day support voice-based queries and its natural language query works in 28 languages, which Henschen said is the broadest language support available.

"Oracle raised the bar on natural language query a couple of years ago when it released its Day by Day app, which used device-native voice-to-text and introduced explicit thumbs-up/thumbs-down training," Henschen said.

Another vendor Henschen noted is Qlik, which advanced the natural language capabilities of its platform through its January 2019 acquisition of Crunch Data.

"A key asset was the CrunchBot, since rebranded as the Qlik Insight Bot," Henschen said.

He added that Qlik Insight Bot is a bot-building feature that works with existing Qlik applications, and the bots can subsequently be embedded in third-party applications, including Salesforce, Slack, Skype and Microsoft Teams.

"It brings NLQ outside of the confines of Qlik Sense and interaction with a BI system," Henschen said.

Tableau is yet another vendor attempting to ease the analytics process with a natural language processing tool. They introduced Ask Data in February 2019 and Tableau's September 2019 update included the capability to embed Ask Data in other applications.

"When I think about designing a system and taking it the next step forward, Tableau is doing something. [It remembers if someone ran a similar query] and it gives guidance," Underwood said. "It has the information and knows what people are asking, and it can surface recommendations."

Baer similarly mentioned Tableau's Ask Data, while Leone said that the eventual prevalence of natural language query will ultimately be driven by Amazon Web Services, Google and Microsoft.

Dig Deeper on Business intelligence technology