The citizen data scientist comes of age
Business intelligence in an IoT-enabled world promises to convey competitive advantage, operational efficiency and automated support for revamped business models. Great! You get those benefits by using cloud computing, location intelligence, end-user data, advanced analytics, machine learning, AI and the like. No problem! To do this, you need data scientists. Oops!
Data science, despite being called the sexiest job of the 21st century, is still hard to come by. Our schools can’t produce enough data scientists to fulfill the need to find insights within the vast amounts of data we now have at hand. If you’re the CEO of a modern digital business, you need to find a way to scale insights without sending thousands of employees back to school to get PhDs in statistics.
Tech innovators have been rising to this challenge by building tools to help subject matter experts (SMEs) come to the data science party for the last few years. Gartner defines a citizen data scientist (CDS) as “a person who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.” To this day, there is still plenty of debate about whether citizen data scientists exist at all. No tool can magically turn an “ordinary citizen” into a data scientist — but that’s missing the point.
In a digital world, more people need to become stats-savvy. More and more subject matter experts are building, testing and sharing data science, machine learning and AI models. These new “business” citizens are cropping up all over: in algorithmic trading on Wall Street, industrial IoT innovators in manufacturing and energy industries, and in the 22 smart city initiatives announced in 2017 all over the world.
Most of these business citizens don’t call themselves citizen data scientists. Instead, they call themselves fraud investigators, digital marketers or drilling engineers. So, in terms of a title on a business card, citizen data scientists don’t exist. But wait! We shouldn’t throw the CDS baby out with the bathwater. We are, indeed, at the beginning of an innovation trend where new tools bring more people under the data science tent. That’s good and we should celebrate it as a huge advancement.
That trend runs on citizen data science technology innovation. Citizen data science tools make it easier to create, understand, find, share, track, deploy and automate machine learning and statistical models by helping engineers, managers and business users wield data science to solve problems or improve products and/or processes. Here are a few examples of citizen data science tools welcoming SMEs to the data science party:
- Embedded models and algorithms. SMEs access sales and operational data via hundreds of business intelligence (BI) dashboards and applications. New BI tooling can provide one-click access to statistical models. For example, by embedding a predictive forecast algorithm in a particular dashboard, a product manager can apply statistical thinking to forecasts by changing the model’s parameters (seasonality, product mix, demographics, etc.) to mathematically estimate potential sales increases. They don’t have to understand the math; they just have to understand the tool and the factors at play in their own area of expertise.
- Data science sandbox. Some data science platforms provide SMEs with a drag-and-drop interface to explore, select and try out models created by data scientists. The tools let data scientists post custom models, which are vetted and packaged. SMEs can then explore a menu of algorithms, try them out and combine them with other models.
- Facebook for data science. An emerging class of data science tools foments collaboration. These are tools that help SMEs and data scientists share, document and discuss statistical models relevant to specific use cases — and they are often integrated with a sandbox for SME experimentation.
- Enterprise project management and auditability. In some industries, it’s essential that the organization can manage, audit and trace which statistical models are used to make business decisions. In pharmaceuticals, for example, if a model is used to validate drug trial results, that model must be transparent should any issues arise concerning decisions made about the trial. These tools help scrutinize and validate the governance and control of models as their impact on business decisions grow.
- Intrinsic machine learning. In the future, pretty much every software application will learn as you use it. When machine learning is built into an app, you’re not really doing data science, but you are consuming it. All you’ll notice is that the application seems to become more useful (a good thing).
These technologies do not turn citizens into PhDs, nor do they aim to. But they do serve to convey capabilities and intelligence wrested from data science, AI and machine learning. And they are serving more and more of the workforce for increasingly varied purposes. That expands the utility of data and of data science so more people can make smarter decisions, and that’s good for citizens everywhere.
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.