Sergey Nvns - Fotolia

Government IT pros: Hiring data scientists isn't an exact science

There's no such thing as a data science unicorn, U.S. government experts caution. You'll need more than programmers and technical expertise on your data science team.

WASHINGTON, D.C. -- Government agencies face the same problems as enterprises when it comes to turning their vast data stores into useful information. In the case of government, that information is used to provide services such as healthcare, scientific research, legal protections and even to fight wars.

Public sector IT pros at the Veritas Public Sector Vision Day this week talked about their challenges in making data useful and keeping it secure. A major part of their work currently involves finding the right people to fill data analytical roles, including hiring data scientists. They described data science skills as a combination of roles that require technical, as well as subject matter expertise, which often requires a diverse team to become successful.

Tiffany Julian, data scientist at the National Science Foundation, said she recently sat in on a focus group involved with the Office of Personnel Management's initiative to define data scientist.

"One of the big messages from that was, there's no such thing as a unicorn. You don't hire a data scientist. You create a team of people who do data science together," Julian said.

Julian said data science includes more than programmers and technical experts. Subject experts who know their company or agency mission also play a role.

"You want your software engineers, you want your programmers, you want your database engineers," she said. "But you also want your common sense social scientists involved. You can't just prioritize one of those fields. Let's say you're really good at Python, you're really good at R. You're still going to have to come up with data and processes, test it out, draw a conclusion. No one person you hire is going to have all of those skills that you really need to make data-driven decisions."

Wanted: People who know they don't know it all

Because she is a data scientist, Julian said others in her agency ask what skills they should seek when hiring data scientists.

You don't hire a data scientist. You create a team of people who do data science together.
Tiffany JulianData scientist, National Science Foundation

"I'm looking for that wisdom that comes from knowing that I don't know everything," she said. "You're not a data scientist, you're a programmer, you're an analyst, you're one of these roles."

Tom Beach, chief data strategist and portfolio manager for the U.S. Patent and Trademark Office (USPTO), said he takes a similar approach when looking for data scientists.

"These are folks that know enough to know that they don't know everything, but are very creative," he said.

Beach added that when hiring data scientists, he looks for people "who have the desire to solve a really challenging problem. There is a big disconnect between an abstract problem and a piece of code. In our organization, a regulatory agency dealing with patents and trademarks, there's a lot of legalese and legal frameworks. Those don't code well. Court decisions are not readily codable into a framework."

'Cloud not enough'

Like enterprises, government agencies also need to get the right tools to help facilitate data science. Peter Ranks, deputy CIO for information enterprise at the Department of Defense, said data is key to his department, even if DoD IT people often talk more about technologies such as cloud, AI, cybersecurity and the three Cs (command, control and communications) when they discuss digital modernization.

"What's not on the list is anything about data," he said. "And that's unfortunate because data is really woven into every one of those. None of those activities are going to succeed without a focused effort to get more utility out of the data that we've got."

Ranks said future battles will depend on the ability of forces on land, air, sea, space and cyber to interoperate in a coordinated fashion.

"That's a data problem," he said. "We need to be able to communicate and share intelligence with our partners. We need to be able to share situational awareness data with coalitions that may be created on demand and respond to a particular crisis."

Ranks cautioned against putting too much emphasis on leaning on the cloud for data science. He described cloud as the foundation on the bottom of a pyramid, with software in the middle and data on top.

"Cloud is not enough," he said. "Cloud is not a strategy. Cloud is not a destination. Cloud is not an objective. Cloud is a tool, and it's one tool among many to achieve the outcomes that your agency is trying to get after. We find that if all we do is adopt cloud, if we don't modernize software, all we get is the same old software in somebody else's data center. If we modernize software processes but don't tackle the data ... we find that bad data becomes a huge boat anchor or that all those modernized software applications have to drive around. It's hard to do good analytics with bad data. It's hard to do good AI."

Beach agreed. He said cloud is "100%" part of USPTO's data strategy, but so is recognition of people's roles and responsibilities.

"We're looking at not just governance behavior as a compliance exercise, but talking about people, process and technology," he said. "We're not just going to tech our way out of a situation. Cloud is just a foundational step. It's also important to understand the recognition of roles and responsibilities around data stewards, data custodians."

This includes helping ensure that people can find the data they need, as well as denying access to people who do not need that data.

Nick Marinos, director of cybersecurity and data protection at the Government Accountability Office, said understanding your data is a key step in ensuring data protection and security.

"Thinking upfront about what data do we actually have, and what do we use the data for are really the most important piece questions to ask from a security or privacy perspective," he said. "Ultimately, having an awareness of the full inventory within the federal agencies is really all the way that you can even start to approach protecting the enterprise as a whole."

Marinos said data protection audits at government agencies often start with looking at the agency's mission and its flow of data.

"Only from there can we as auditors -- and the agency itself -- have a strong awareness of how many touch points there are on these data pieces," he said. "From a best practice perspective, that's one of the first steps."

Dig Deeper on Data science and analytics