Merging DevOps and machine learning requires restructuring

Companies that are restructuring in order to merge their traditional DevOps teams with their machine learning efforts to aid with accessibility need to include voices from multiple teams.

The theory behind combining machine learning engineering efforts with DevOps is to integrate engineers and machine learning efforts with traditional software engineers in order to move R&D into production. While machine learning operations is a rising trend, getting the culture and transition into MLOps teams right can be difficult.

There are several challenges on AI and machine learning development, and ensuring proper communication and company focus can be crucial to a company's success. Some companies may require organizational changes in order to better empower an MLOps team.

Dillon Erb, CEO and co-founder of Paperspace, an AI and machine learning development platform, speaks on the rise of MLOps and the organizational changes required for proper integration as well as what the future may hold in this field. 

What is MLOps and why is it developing?

Dillon Erb: MLOps represents the merging of machine learning engineering efforts with more traditional DevOps teams. It encompasses the tools and systems that allows teams to integrate all machine learning efforts and engineers with traditional software engineers in order to move initiatives from R&D into production environments. Historically, these were two different groups. MLOps is motivated by the idea that when you collect a lot of data, you can create powerful predictive models that can function like any other software application in an organization.

Machine learning is still new in the sense that a lot of the new frameworks and techniques around the cutting-edge applications and techniques such as deep learning have happened in the past five years. Many companies and research groups have been investing in deep learning technology, but integrating that with existing systems is still difficult. 

The reason it's happening now is because over the past few years, things have been moving at a breakneck pace in terms of advances in the machine learning domain with new frameworks, technologies -- e.g., PyTorch, Tensor Flow, reinforcement learning, synthetic data, etc. -- all merging. So now the big question becomes: How do we integrate it with everything else?

Oftentimes there are communication issues between data scientist teams and operations and production teams. Who would the MLOps team be under?

Erb: The chief AI officer is largely the person today who is responsible for architecting these systems and merging these two worlds. Ultimately, companies have teams that are responsible for monitoring and deploying code to production. They need to be part of this conversation. It's a new title that means a lot of things. We've seen it being used within the sales organization to the office of the CTO.

Depending on where it fits within an organization, it could fall under their purview to run the MLOps efforts. Generally, what we see is that chief AI officers are also being tasked with figuring out what are the applications in the company for this technology -- in other words, figuring out what we can do with this technology as opposed to how we will do it. Once it gets down to how we do it, you need to bring in the existing teams that build out the production applications.

Is this an organizational problem, a technology problem, or both?

Erb: I'd say it's both. On the technology side, which is the most obvious, it boils down to what programming languages and techniques are used. In machine learning, the dominant language is Python, while traditional software engineering is primarily dominated by JavaScript, Go, Ruby, etc. In other words, you have technologies that don't really interface, which creates an issue. We hear stories all the time about machine learning groups building a model in Python and then handing it off to a DevOps team that basically rewrites the entire thing in Java or a technology they are more familiar with.

On the organizational side, because machine learning is still an emerging technology, it's not completely obvious where it sits within a larger organization. Is a machine learning team distinct from a data science team? Who does it report to? Our view is that it should be part of traditional software engineering organization, but the tools aren't there yet and that's what we're working on.

Where do you see the DevOps and machine learning 10 years from now?

Erb: What we see as the biggest outcome of the collective efforts of all the companies working in the space is that machine learning is no longer a distinct entity within an organization, but instead is directly integrated. That's happening in two directions. One, companies that are building out custom machine learning models that are part of their internal IP, they are able to embed those at an organizational level. On the other side, it means companies that might have a thousand software engineers and maybe only 10 dedicated machine learning engineers today can use tools like Gradient to turn every one of those developers into a machine learning engineer. That's what we're doing: taking the best of the machine learning world and making it operational within the software development process.

One analogy that comes to mind is 15 years ago, companies had a separate web team and a mobile team. Now the common model is just a software engineering team that does both web and mobile applications. The convergence of those two worlds is what's at stake here. That's why we believe it's so important for companies to invest today in modern software development tools that are built with the machine learning perspective.

Next Steps

Battle of the buzzwords: AIOps vs. MLOps square up

Dig Deeper on AI technologies