How to structure and manage a data science team
Data science teams typically include various analytics and data professionals and can be set up in different ways, as explained here along with tips on managing them.
Organizations increasingly see data as a valuable asset that will help them succeed now and in the future. In one 2019 survey of BI and analytics professionals, a combined 94% of the 500 respondents cited data and analytics as very important or somewhat important contributors to business growth and digital transformation strategies in their organizations.
The survey, conducted for BI software vendor MicroStrategy and detailed in a report titled "2020 Global State of Enterprise Analytics," also found that 59% of those organizations were moving forward on advanced and predictive analytics applications -- the realm of data science. That was up seven percentage points from a similar survey done a year earlier.
Survey respondents listed security and privacy concerns as the No. 1 barrier to more effective use of data and analytics, but other top challenges included limited access to siloed data, the lack of qualified talent, insufficient employee training and the absence of an analytics strategy. To help turn data into actionable information, more and more organizations are creating data science teams to lead their efforts in areas such as data mining, predictive modeling, machine learning and AI.
Let's look at best practices for structuring and managing a data science team, including the different ways one can be set up, the positions it's likely to include and the executives who a team may report to in an organization.
Different models for structuring a data science team
"To some degree, building and maintaining a strong data science team is an art," said John Bottega, president of the EDM Council, a global association that focuses on data management best practices, standards and training.
Responsibilities for data collection, management and analysis once typically fell under the CIO, whose IT team worked with business users to implement data warehouses and BI systems to hold and organize data and do basic analysis and reporting. However, over the past two decades, more organizations separated the data function into its own department as the amount of internal data stores grew, supporting technologies evolved and data-related tasks became more differentiated and specialized.
The increasing importance of advanced analytics to business success also drove the need for a data science team with skilled data scientists and other workers. Today, many organizations have a team or an entire data science department; larger ones may have multiple teams that operate independently or in a coordinated way.
How companies structure their teams varies based on the maturity of their data science program, as well as their data analytics goals, overall organizational structure and enterprise culture. However, some common models on data science team structure have emerged, with each having pros and cons. Team structures can be:
- Decentralized. Data science team members work within the individual business units they support. This allows team members to closely collaborate with business executives and workers on data science projects, but it can hinder the strategic use of data across an organization and require more resources than smaller companies may have available.
- Centralized. The data science function is consolidated at the enterprise level under a single manager, who assigns team members to individual projects and oversees their work. This model more easily allows for an enterprise-wide strategic view and uniform implementation of analytics best practices, but it can limit the ability of team members to become experts in a particular area of the business. Some organizations establish a formal data science center of excellence to house a centralized team.
- Hybrid. The data science team is managed centrally but members are assigned to work with specific business operations and are accountable for helping those units reach their objectives to make data-driven decisions. In hybrid structures, a center of excellence may also focus on promoting data science best practices and standards. As with the decentralized model, resource constraints can be an issue.
Data science team roles and responsibilities
There are some common elements that a data science team must have to be successful.
"Regardless of industry, data science teams need to be strong in three core areas: mathematical, technology and business acumen," Bottega said. "Finding a single person that excels in all three is quite rare. Many companies will have someone that is fluent in two of three and then the rest of the team can be built around that, filling in the gaps to ensure the team as a whole is strong in all three."
Small organizations or those with limited analytics needs or early-stage data science initiatives may have a generalist handle all the required tasks. Larger entities, as well as those with more mature programs, typically include some combination of the following roles in their data science teams.
Data scientist. As the title indicates, data scientists are the core members of a team. They use statistical methods, machine learning algorithms and other tools to analyze data and create predictive models; some also build data products, recommendation engines, chatbots and other technologies for various use cases. Data scientists typically have a variety of skills in areas such as mathematics, statistics, data wrangling, data mining, coding and predictive modeling, as well as business knowledge and communication and collaboration skills. Increasingly, they also have advanced data science degrees or graduate-level data science certifications.
Data analyst. A data analyst doesn't have the full skill set of a data scientist but can support data science efforts. The main responsibilities of data analysts are to collect and maintain data from operational systems and databases, use statistical methods and analytics tools to interpret the data, and prepare dashboards and reports for business users.
Data engineer. Data engineers are responsible for building, testing and maintaining data pipelines; they generally have a background in software engineering or computer science that suits their focus on the technology infrastructure and data collection, management and storage. They also often work closely with data scientists on data quality, data preparation and model deployment and maintenance tasks.
Data architect. A data architect designs and oversees the implementation of the underlying systems and data infrastructure that the team uses. In some cases, a data engineer might also handle this role.
Machine learning engineer. Also sometimes called an AI engineer, this position works in conjunction with data scientists to create, deploy and maintain the algorithms and models needed for machine learning and AI initiatives.
In some organizations, data science teams may also include these positions.
Citizen data scientist. An informal role, this can involve business analysts, business-unit power users and other employees who are capable of doing their own data analytics work. Citizen data scientists often have an interest in, acumen for, or some training on advanced analytics, although the technologies they use -- for example, automated machine learning tools -- typically require little to no coding. They often work outside of a data science team but may be incorporated into ones that are embedded in business units.
Business analyst. In some cases, business analysts may be members of a data science team in their regular role, which includes evaluating business processes and translating business requirements into analysis plans -- areas in which they can help support the work of data scientists.
Data translator. A new addition to the roster, these professionals -- also known as analytics translators -- act as a liaison between data science teams and business operations and help plan projects and translate the insights gleaned from data analytics into recommended business actions.
Data visualization developer or engineer. They're tasked with creating data visualizations to make information more accessible and understandable for business professionals. However, data scientists and data analysts may handle this role themselves on some teams.
Who manages and oversees data science teams?
A team may be led by a director of data science, data science manager, lead data scientist or similar managerial position. The reporting structure for teams similarly varies. Generally, though, organizations assign either a C-level executive or high-ranking functional manager to oversee the data science team.
The most visible data science leader is the chief data officer. The CDO position dates back to 2002, with financial services firm Capital One widely recognized as the first company to implement the role. Many others have since followed suit: Data and analytics consultancy NewVantage Partners reported that 65% of 85 large companies it surveyed in 2020 had CDOs, up from 12% when it first did the annual survey in 2012. Initially focused primarily on data governance, management and security functions, many CDOs have now also taken on responsibility for data science, analytics and AI.
Other organizations have created a chief analytics officer (CAO) role to oversee their data science and analytics teams, while some have combined the CDO and CAO positions into a single chief data and analytics officer. In addition, the head of a data science team may report to a different executive -- for example, the COO, CFO or CIO, or a position such as vice president of analytics, vice president of business data or director of data and strategy.
How data scientists work with business users
Organizations of all kinds are striving to become data-driven, and with good reason: Many see it as a key to remaining competitive in the digital age. To that end, data science teams should work collaboratively with business managers to:
- Clearly understand the business questions they want the team to answer.
- Articulate the objectives they have for using the information it provides.
- Map out how to apply the information to make decisions and take actions.
"Data scientists need to work closely with the business unit to understand how the data they provide helps drive the business and understand exactly what [business users] need out of the data," said Josh Drew, Boston-area regional vice president for Robert Half Technology and The Creative Group, two units of staffing firm Robert Half International Inc.
Once they have that understanding, data science teams cannot merely present their findings. They must help their business colleagues understand the insights gained from the data and how that information can shape product and services offerings, marketing campaigns, supply chain management and other key parts of business operations to support enterprise goals, such as higher revenue, increased efficiency and better customer service.
Tools that a data science team needs
Dozens of tools, ranging from data visualization and reporting software to advanced analytics, machine learning and AI technologies, enable the work that data science teams do. The number and combination of technologies needed is unique to each team, based on its goals and skill levels.
The following is a list of commonly used data science tools that includes both commercial and open source technologies:
- statistical analysis tools, such as SAS and IBM SPSS;
- machine learning frameworks and libraries, including TensorFlow, Weka, Scikit-learn, Keras and PyTorch;
- data science platforms from various vendors that provide diverse sets of capabilities for analytics, automated machine learning, and workflow management and collaboration;
- programming languages, in particular Python, R, Julia, SQL, Scala and Java;
- Jupyter Notebook and other interactive notebook applications for sharing documents that contain code, equations, comments and related information;
- data visualization tools and libraries, such as Tableau, D3.js and Matplotlib;
- Spark, Hadoop and other big data platforms and analytics engines, as well as cloud object storage services and NoSQL databases; and
- the Kubernetes container orchestration service for deploying analytics and machine learning workloads in the cloud.
Best practices for managing a data science team
Executives and team leaders who are seeking to build and mature their data science programs should consider the following best practices for managing their teams.
- Seek out workers with a range of business and interpersonal skills in addition to technical ones to help ensure that the team can meet organizational objectives.
- Create a culture of learning and innovation that challenges team members and encourages them to bring new thinking to business problems and issues.
- Promote analytics projects that encourage close collaboration between the data science team and the business units they support.
- Evaluate team members at least in part on the business successes their work drives.
- Develop a mentorship program to help advance the skills of junior team members, and do ongoing training to ensure that all workers stay current on key data science techniques and technologies.
- Because data scientists are in high demand and experienced ones have plenty of job opportunities, design a talent management program to help keep them and other team members from leaving.