Best practices for getting started with MLOps
As AI and machine learning become increasingly popular in enterprises, organizations need to learn how to set their initiatives up for success. These MLOps best practices can help.
With the recent and rapid influx of AI and machine learning across enterprises, teams looking to implement new tools and systems don't always know where to start. MLOps practices can offer guidance, but knowing how to establish these initiatives effectively is crucial.
The past year's increasing interest in AI and ML models has many organizations hoping to scale their AI and ML initiatives. MLOps sits at the intersection of machine learning practices and DevOps. It provides a framework for building and deploying ML models that aims to streamline model development as well as boost reliability and efficiency, like DevOps' role for traditional software.
Yaron Haviv, co-founder and CTO of MLOps platform Iguazio, and Noah Gift, founder of AI education provider Pragmatic AI Labs, unpack this increasing need for MLOps in their book, Implementing MLOps in the Enterprise: A Production-First Mindset. Haviv and Gift offer detailed best practices and tutorials for MLOps initiatives, explaining how to navigate key processes and highlighting which aspects of the framework are most beneficial for anyone looking to integrate MLOps practices into their workflows.
"There are some fundamental components [of MLOps] that are non-negotiable," Gift said. Just as you can't furnish a kitchen with new appliances without a solid house foundation in place, adopting foundational MLOps practices is the best way to ensure models are deployed successfully in an enterprise.
3 MLOps best practices to consider
Haviv and Gift suggest several overarching tips to help anyone hoping to start building MLOps practices into their organization.
1. Go beyond model training
An MLOps pipeline consists of four main stages: data collection and preparation; model development and training; ML service deployment; and continuous feedback and monitoring.
Model training is often the initial focus for organizations starting to dip their feet into MLOps. But while training models is important, prioritizing that process over other stages that are equally necessary for the model to work can quickly lead to problems.
"If you're training the model, and you don't know how to reproduce the thing that you just did, then you introduced a lot of non-deterministic behaviors," Gift said. "So you have risk associated with using this [model you created], because what if you're making important decisions based on this model?"
Having a clear methodology in place across all MLOps stages helps teams navigate potential risk. With an MLOps framework in place, organizations can ensure that they aren't implementing models into their workflows without understanding and evaluating all the core components first.
2. Have a production-first mindset
A successful MLOps pipeline begins with the end in mind. In Implementing MLOps in the Enterprise, Haviv and Gift describe this approach as a "production-first mindset."
"A production-first mindset means that you have a critical thinking style when you're putting something into production," Gift said.
Adopting a production-first mindset means that teams constantly and critically evaluate the entire MLOps process, from goal inception through deployment and continuous monitoring. Maintaining a skeptical outlook and keeping an eye on the full picture can help model developers avoid tunnel vision and instead consider the pipeline holistically.
3. The less you can do, the better
Another core tenet of a successful MLOps initiative is to decrease complexity whenever possible.
"A big component ... is to reduce the complexity of what it is an organization is doing," Gift said. "The less things you do for yourself, the less complexity you have in your organization, [and] the greater the chance of success that an organization will have."
While organizations can reduce their MLOps initiatives' complexity in several ways, two practical options are to integrate continuous integration/continuous delivery (CI/CD) pipelines and use pretrained models.
At the heart of CI/CD is the feedback loop, which can signal areas for improvement to MLOps teams, Gift said. That feedback can come from an array of continuous improvement mechanisms, including automated linting, testing and deployment.
"The CI/CD pipeline is almost like a truth serum ... something that's just constantly interrogating your code and making it better," Gift said.
Another way to decrease complexity is to use pretrained models rather than building an ML model from the ground up. Rapid technological advancements in AI and ML are beginning to lead to model parity, Gift said -- in other words, many of today's pretrained models operate at similar levels of performance.
Building your own model often doesn't result in beneficial ROI due to the time and resources required, especially for generative AI models that require large amounts of data and compute. Adopting a well-researched pretrained model that is a good fit for an organization's specific needs can save time and money, enabling teams to concentrate on other aspects of the MLOps pipeline and make better use of their organization's strengths.
To choose among pretrained models, organizations can consider standard characteristics like features -- for instance, whether the deployment mechanism fits the intended use case -- and price points. Another important factor is potential negative externalities, which Gift describes as the drawbacks of integrating certain technologies into an organization.
This might include assessing how an ML model could negatively affect creativity and productivity. But it also encompasses ethical concerns, such as hallucination and bias, as well as increasingly worrisome legal issues. With many AI and ML developers wrapped up in lawsuits over copyright issues, understanding the legal ramifications of using certain pretrained models is essential for organizations hoping to implement them.
These negative externalities can enter organizations as unquantifiable risk, Gift said. "If you use a model that has copyrighted data and then it turns out that the company that trained the model loses the [copyright] lawsuit, are you then also infringing on this copyrighted data?"
Considering these questions is essential when weighing the characteristics of one model over another. Look for models whose organizations elevate responsible AI from their model development processes to boardroom-level executives.
In this excerpt from the book's first chapter, Haviv and Gift describe how an effective MLOps strategy must be comprehensive and systematic.
MLOps: What Is It and Why Do We Need It?
At the root of inefficient systems is an interconnected web of incorrect decisions that compound over time. It is tempting to look for a silver bullet fix to a system that doesn't perform well, but that strategy rarely, if ever, pays off. Consider the human body; there is no shortage of quick fixes sold to make you healthy, but the solution to health longevity requires a systematic approach.
Similarly, there is no shortage of advice on "getting rich quick." Here again, the data conflicts with what we want to hear. In Don't Trust Your Gut (HarperCollins, 2022), Seth Stephens-Davidowitz shows that 84% of the top 0.1% of earners receive at least some money from owning a business. Further, the average age of a business founder is about 42, and some of the most successful companies are real estate or automobile dealerships. These are hardly get-rich-quick schemes but businesses that require significant skill, expertise, and wisdom through life experience.
Cities are another example of complex systems that don't have silver bullet fixes. WalletHub created a list of best-run cities in America with San Francisco ranked 149 out of 150 despite having many theoretical advantages over other cities, like beautiful weather, being home to the top tech companies in the world, and a 2022-2023 budget of $14 billion for a population of 842,000 people. The budget is similar to the entire country of Panama, with a population of 4.4 million people. As the case of San Francisco shows, revenue or natural beauty alone isn't enough to have a well-run city; there needs to be a comprehensive plan: execution and strategy matter. No single solution is going to make or break a city. The WalletHub survey points to extensive criteria for a well-run city, including infrastructure, economy, safety, health, education, and financial stability.
Similarly, with MLOps, searching for a single answer to getting models into production, perhaps by getting better data or using a specific deep learning framework, is tempting. Instead, just like these other domains, it is essential to have an evidence-based, comprehensive strategy.
What Is MLOps?
At the heart of MLOps is the continuous improvement of all business activity. The Japanese automobile industry refers to this concept as kaizen, meaning literally "improvement." For building production machine learning systems, this manifests in both the noticeable aspects of improving the model's accuracy as well the entire ecosystem supporting the model.
A great example of one of the nonobvious components of the machine learning system is the business requirements. If the company needs an accurate model to predict how much inventory to store in the warehouse, but the data science team creates a computer vision system to keep track of the inventory already in the warehouse, the wrong problem is solved. No matter how accurate the inventory tracking computer vision system is, the business asked for a different requirement, and the system cannot meet the goals of the organization as a result.
So what is MLOps? A compound of Machine Learning (ML) and Operations (Ops), MLOps is the processes and practices for designing, building, enabling, and supporting the efficient deployment of ML models in production, to continuously improve business activity. Similar to DevOps, MLOps is based on automation, agility, and collaboration to improve quality. If you're thinking continuous integration/continuous delivery (CI/CD), you're not wrong. MLOps supports CI/CD. According to Gartner, "MLOps aims to standardize the deployment and management of ML models alongside the operationalization of the ML pipeline. It supports the release, activation, monitoring, performance tracking, management, reuse, maintenance, and governance of ML artifacts".
Olivia Wisbey is associate site editor for TechTarget Enterprise AI. She graduated from Colgate University with Bachelor of Arts degrees in English literature and political science, where she served as a peer writing consultant at the university's Writing and Speaking Center.