Photobank - Fotolia

Bolster citizen data scientists with support, training

As more citizen data scientists take on work traditionally tasked to business analysts, organizations must consider how to support them. Start with a centralized data team.

Analysts and technology pundits will tell you that the path to digital resilience lays on a foundation of strong analytical capabilities. In fact, according to the recent 2019 Gartner CIO Agenda survey of 3,000 CIOs, data analytics and machine learning will be game changers for the coming year.

The trouble is that the true number crunchers -- business analysts and data scientists -- are a precious commodity these days.

"Most organizations rely on specialized data scientists, who are only a select few within the organization and, therefore, are stretched thin, resulting in bottlenecks across the organization," said Rami Chahine, vice president of product management at Datawatch Corp. "This can delay business decisions, impacting performance."

That's why more and more enterprises are trying to extend the reach of their analytical capabilities outside that sheltered pool of specialists. They're arming citizen data scientists with more data and more tooling so these business subject matter experts can design and run the reports they need in a timely fashion and make decisions based on data that is as close to real time as possible.

But this doesn't mean citizen data scientists can do all of the analysis themselves. Truly effective organizations recognize there are certain things these empowered users can and can't do through a self-service analytics program. The following are some good rules of thumb for organizations seeking to set up the right guardrails and mileposts for effective -- but realistic -- use of citizen data scientists.

Create a centralized data function

If organizations are to ensure that data accessed by citizen data scientists is clean, up to date and adheres to security and compliance best practices, then they need a backstop. That backstop should be some kind of centralized data team that acts as the referee or conductor, ensuring that the right data gets to the right types of users.

"Data teams have to protect organizations' data but can't clutch it so tightly that they're a blocker for the business," said Marie MacBain, vice president of research operations at G2 Crowd Inc., a review site for business software and services. "Establishing safety nets like limited permissions and clear data dictionaries will minimize risk while enabling users in the business to drive insights themselves."

Permissions management is a huge function here, MacBain said.

"Many of our users access data through dashboards which are controlled by our data team. They're able to make the right data accessible -- or inaccessible -- as appropriate," she said. "This way, team members are able to easily self-serve where it makes sense."

Empowering citizen data science requires expert mentoring to succeed throughout all steps of a machine learning project lifecycle.
Jen Underwoodsenior director, DataRobot

The central data department should be in charge of not only safeguarding data, but also curating data sets so that the most salient information is at the fingertips of those to whom it is relevant.

"What's key is that the proper infrastructure is in place to provide all of the data citizen users [who are] touching the data with governed data so they are comfortable running reliable reports based on organized and trusted data to impact critical business decisions," Chahine said.

This starts by defining goals and matching data with specific audiences, said Rob Perry, vice president of product marketing at ASG Technologies Group Inc. He also advised organizations to define "a curation workflow that makes sense for their data and audience.

"Once the groundwork is set, the focus can shift to using the data sets in a way that adds value, enabling those creating the data sets to think critically about the information they are accessing and to be selective about the quality of information they're incorporating," he said. "While not everything needs to be perfect, it's important that nothing in the data set is a mystery."

Encourage training and collaboration with BI analysts

Before citizen data scientists can be expected to incorporate curated data sets, however, they must be trained properly. It's important to remember that even when they're eager to learn, they often don't have the analytical educational foundation that a typical BI analyst has. This means you're looking at steeper learning curves for tooling and processes, said Jen Underwood, senior director of product marketing at DataRobot.

"Don't underestimate it. To be successful, you can't just tell the business users where the data sources reside and expect them to figure out the rest on their own," Underwood said. "Empowering citizen data science requires expert mentoring to succeed throughout all steps of a machine learning project lifecycle."

This means providing them with some kind of basic introduction to data science concepts and terms, as well as providing additional training on tools and selection of data and analysis projects for their business needs.

It also means remembering where the expertise lies: Citizen data scientists have subject matter expertise, while existing data scientists and analysts have analysis expertise. The trick to get the most out of self-service programs is to create a culture in which these populations collaborate well.

"By pairing data engineers/analysts with employees from other departments, employees spend less time trying to figure out how to use the data analytics tool than they would if they were working on their own, enabling them to focus on learning how to use the data to tell a story," Perry said.

He's not alone in that belief.

"We should also not be expecting our citizen data scientists to understand algorithms and data models," said Tom Wilde, CEO at Indico Data Solutions Inc. "We should leverage them instead to help identify and label the right data inputs so that everyone can agree on what represents the right and wrong outcomes in the context of the business goal at hand."

Dig Deeper on Business intelligence management