Nabugu - stock.adobe.com
Data science tools of the trade fill skills gap
Data science tools are becoming more intelligent and better at understanding intent and context. But don't stop advertising that data scientist job on the web just yet.
The data scientist remains one of the most in-demand jobs, and there are a growing number of data science programs across the country to teach techies the necessary skills. But by the time you gain the skills you need, the job and the data science tools may look much different than they do today.
Data scientist skills vary depending on the company, but they generally involve the ability to identify business needs and formulate problem statements based on business requirements. They can prepare the right data for analysis to help meet business needs and assist company leaders to help make decisions, according to Adrian Bowles, industry analyst and founder of Storm Insights, speaking in a Dataversity webinar called "The Disappearing Data Scientist."
"They can find the right tools, find the right approach to the data, analyze it in a way that makes sense and interpret that," Bowles said in the webinar. "A data scientist is a business storyteller who tells a narrative based on quantitative data."
Bowles likened the demand for data scientists to the need for programmers in the 1950s. By the 1960s and 1970s, there were major changes in the way programming was done.
Programmers haven't gone away -- they are still in high demand -- but the role itself, and the way programming is done, has evolved because the tools and techniques are at a higher level.
The industry has also seen tools being used by business users to generate code, removing some of the need for programming, Bowles said.
"Over the years, the skill level to write software has changed; you don't need to know as much about the underlying machine because everything is at a higher level of abstraction," he said. "The end result, if you measure [it] in terms of productivity and what is being delivered by programmers today, is much higher than what was being delivered by folks that knew more about the underlying structure years ago."
Data science tools augment, automate
Data growth combined with the data scientist shortage has spawned an environment in which executives expect line-of-business users to become adept at using self-service data science tools. On the supply side, AI technologies such as machine learning classification systems and natural language processing are maturing to the point where they can augment business analysis or automate processes, but the quality of augmentation and automation varies and, in some cases, still has a ways to go.
Consider these skills and data science tools available today that can augment or automate tasks.
Identify and interpret the business problems by talking to users. Conversations with users are largely in the purview of data scientists today, but there are many systems that use natural language processing to classify and understand user requests. One example of this is IBM Watson in healthcare, which knows when to ask for more data to help make diagnoses.
Identify and prepare data. This is increasingly solvable with off-the-shelf technologies that can model data and map it to see what needs to be done.
Data analysis. Data scientists determine the type of analyses necessary based on the problems they are trying to solve, a process that can be semi-automated today, Bowles said. Software and vendors that are beginning to meet the demand for self-service data science tools include IBM Watson Explorer and Watson Analytics, Microsoft Power BI, MicroStrategy, Oracle Data Visualization, Qlik Sense, SAP Lumira and SAP Analytics Cloud, SAS Visual Analytics, Sisense, Tableau Software, and Tibco Spotfire.
Data interpretation and storytelling. A good data scientist can go beyond the spreadsheet to tell the data story. There are interesting tools that can automate the process of turning data into a narrative. Microsoft Power BI, MicroStrategy, Qlik Sense and Tableau Software offer integration with both narrative science tools -- such as SAP Lumira and Sisense -- and automated insights tools -- like Tibco Spotfire.
Choosing the right data science tools
The data science field is reaching a point where there is so much data -- and so much expectation that we can analyze all that data -- that people are trying to distribute the responsibility for it -- to machines, to self-service analytics tools and to people who consider themselves citizen data scientists
In any field there is a selection of tools you'll use over your career. It's important to know which tools to use and when.
"It's the tools plus the knowledge; you can't have one without the other," Bowles said. "I go back to [the quote] that 'A fool with a tool is still a fool.' You need to have explicit knowledge to select the right tool at the right time."
Data science tools cover a broad range of capabilities, from data prep and machine learning algorithms to technical tools for clustering, recommendation, regression and statistical analysis. These tools will continue to evolve and diminish the need for some data science skills -- similar to what happened in the early days of programming.
The results of the data analysis, assuming you have it right, can enable more than just creating a visualization of the data. It can augment the data visualization tools that have been improving over the last decade.
"Now you can visualize [the data] and hear about it as a narrative, and that's a pretty important step forward," Bowles said. "But all that presupposes that the data it is working from is accurate.
"If you're doing analysis and your business' future depends on it, it doesn't matter if you have a great visualization or a great narrative if the data itself is wrong," he added.
With that, Bowles recommends having data scientists on staff to train and oversee the end users who have been entrusted with the self-service data science tools. This can make the business users and the data scientists more productive.
"The tools are designed to be easy, but if you don't understand what you are doing, at least at a surface level of experiential design, probability and statistics, you aren't going to articulate your problem in a way that will get the right answer," Bowles said in the webinar. "You can't democratize data science by throwing tools at a problem when the problem is understanding."
So while data science tools are becoming more intelligent and getting better at understanding intent and context, they have some maturing to do. In other words, you can't pull data scientists out of your executive suite and replace them with AI technologies -- but tech can fill data science skills gaps.