Getty Images/iStockphoto

Tutorial

Set up a machine learning pipeline in this Kubeflow tutorial

For teams running machine learning workflows with Kubernetes, using Kubeflow can lead to faster, smoother deployments. Get started with this installation guide.

Chris Tozzi

By

Chris Tozzi

Published: 19 Apr 2023

You don't have to use Kubernetes to power machine learning deployments. But if you do -- and there are many reasons why you might want to -- Kubeflow is the simplest and fastest way to get machine learning workloads up and running on Kubernetes.

Kubeflow is an open source tool that streamlines the deployment of machine learning workflows on top of Kubernetes. Kubeflow's main purpose is to simplify setting up environments for building, testing, training and operating machine learning models and applications for data science and MLOps teams.

It's possible to deploy machine learning tools such as TensorFlow and PyTorch on a Kubernetes cluster directly without using Kubeflow, but Kubeflow automates much of the process required to get these tools up and running. To decide whether it's the right choice for your machine learning projects, learn how Kubeflow works, when to use it and how to install it to deploy a machine learning pipeline.

The pros and cons of Kubernetes and Kubeflow for machine learning

Before deciding whether to use Kubeflow specifically, it's important to understand the pros and cons of running AI and machine learning workflows on Kubernetes in general.

Should you run machine learning models on Kubernetes?

As a platform for hosting machine learning workflows, Kubernetes offers several advantages.

The first is scalability. With Kubernetes, you can easily add or remove nodes from a cluster to modify the total resources available to that cluster. This is particularly beneficial for machine learning workloads, whose resource consumption requirements can fluctuate significantly. For example, you might want to scale your cluster up during model training, which usually requires a lot of resources, then scale back down to reduce infrastructure costs after training is done.

Machine learning project steps: Identify a business problem, lay out process and gather info from experts, choose and prepare data, choose and tune algorithm, and retune based on results. — Tools such as Kubeflow can speed up deployment of machine learning projects by standardizing and streamlining stages of the model development lifecycle.

Hosting machine learning workflows on Kubernetes also offers the advantage of providing containers access to bare-metal hardware. This is useful for accelerating the performance of your workloads using GPUs or other hardware that wouldn't be accessible on virtual infrastructure. Although you could access bare-metal infrastructure without using Kubernetes by running workloads in standalone containers, orchestrating containers with Kubernetes makes it easier to manage workloads at scale.

A major reason why you might not want to use Kubernetes to host machine learning workflows, however, is that it adds another layer of complexity to your software stack. For smaller workloads, a Kubernetes-based deployment might be overkill. In such situations, running workloads directly on VMs or bare-metal servers could make more sense.

When should you choose Kubeflow?

The chief advantage of using Kubeflow for machine learning is the tool's fast and simple deployment process. With just a few kubectl commands, you get a ready-to-use environment where you can start deploying machine learning workflows.

On the other hand, Kubeflow restricts you to the tools and frameworks it supports -- and might include some resources that you won't end up using. If you just need one or two specific machine learning tools, you might find it simpler to deploy them individually rather than with Kubeflow. But for anyone who needs a general-purpose machine learning environment on Kubernetes, it's hard to argue against using Kubeflow.

Kubeflow tutorial: Install and setup walkthrough

On most Kubernetes distributions, installing Kubeflow boils down to running just a few commands.

This tutorial demonstrates the process using K3s, a lightweight Kubernetes distribution that you can run on a laptop or PC, but you should be able to follow the same steps on any mainstream Kubernetes platform.

Step 1. Create a Kubernetes cluster

Start by creating a Kubernetes cluster if you don't already have one up and running.

To set up a cluster using K3s, first download K3s with the following command.

curl -sfL https://get.k3s.io | sh -

Next, run the command below to start a cluster.

sudo k3s server &

To check that everything's running as expected, run the following command.

sudo k3s kubectl get node

The output should resemble the following.

NAME           STATUS       ROLES            AGE    VERSION
chris-gazelle  Ready  control-plane,master   2m7s  v1.25.7+k3s1

Step 2. Install Kubeflow

With your cluster up and running, the next step is to install Kubeflow.

Use the following commands to do this on a local machine using K3s.

sudo -s
export PIPELINE_VERSION=1.8.5
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION"

If you're installing Kubeflow on a nonlocal Kubernetes cluster, the commands below will work in most cases.

export PIPELINE_VERSION=<kfp-version-between-0.2.0-and-0.3.0>
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/base/crds?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=$PIPELINE_VERSION"

Step 3. Verify that containers are running

Even after you install Kubeflow, it's not fully operational until all the containers that comprise it are running. Verify the status of your containers with the following command.

kubectl get pods -n kubeflow

If the containers aren't running successfully after several minutes, check out their logs to determine the cause.

Step 4. Start using Kubeflow

Kubeflow provides a web-based dashboard to create and deploy pipelines. To access that dashboard, first make sure port forwarding is correctly configured by running the command below.

kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

If you're running Kubeflow locally, you can access the dashboard by opening a web browser to the URL http://localhost/8080. If you installed Kubeflow to a remote machine, replace localhost with the IP address or server hostname where you're running Kubeflow.

Next Steps

Meeting the challenges of scaling AI with MLOps

The rise of automation and governance in MLOps

Tips for planning a machine learning architecture

Dig Deeper on Containers and virtualization

Part of: What DevOps teams should know about MLOps

Up Next

Meeting the challenges of scaling AI with MLOps

As businesses race to capitalize on the promises of AI in the wake of ChatGPT's launch, strategies to move machine learning software from idea to reality are becoming essential.

Decide when and how to adopt an MLOps framework

Unsure where to start when it comes to standardizing your organization's machine learning processes? Explore key considerations and practical tips for adopting an MLOps framework.

Battle of the buzzwords: AIOps vs. MLOps square up

Another -Ops has entered the arena: MLOps. Is it just another buzzword, or does the term hold its own weight? Learn more about it and how it compares to AIOps.

DataOps vs. MLOps: Streamline your data operations

How many Ops combos can we get? What's DataOps? How is it different from MLOps? This article clarifies the differences and how to choose one over the other.

Set up a machine learning pipeline in this Kubeflow tutorial

For teams running machine learning workflows with Kubernetes, using Kubeflow can lead to faster, smoother deployments. Get started with this installation guide.

How to run ML workloads with Apache Spark on Kubernetes

IT staff looking for ways to maintain ML workloads with ease are increasingly turning to Apache Spark. Follow these simple steps to set up a Spark cluster on Kubernetes.

Search Software Quality

'Docker Compose up' now includes AI agents
Docker is expanding the Docker Compose spec to accommodate AI agents in an effort to bring AI development closer to existing ...
Vibe coding with AI sparks debate, reshapes developer jobs
The 'vibe coding' catchphrase shows that GenAI is transforming software developer jobs -- but just how much change is coming? It ...
10 refactoring best practices: When and how to refactor code
Developers only have so much time available. Here's how to prioritize code refactoring to get the most value from the amount of ...

Search App Architecture

8 best practices for creating architecture decision records
An ADR is only as good as the record quality. Follow these best practices to establish a dependable ADR creation and maintenance ...
Refactor vs. rewrite: Deciding how to fix problem software
At some point, all developers must decide whether to refactor code or rewrite it. Base this choice on factors such as ...
Understanding API proxy vs. API gateway capabilities
API proxies and gateways help APIs talk to applications, but it can be tricky to understand vendor language around different ...

Search Cloud Computing

The cloud's role in PQC migration
Even though Q-Day might be several years away, enterprises should develop a strategic plan to prepare for the future. Experts ...
Prioritize security from the edge to the cloud
Businesses can find security vulnerabilities when they push their workloads to the edge. Discover the pitfalls of cloud edge ...
6 edge monitoring best practices in the cloud
When it comes to application monitoring, edge workloads are outliers -- literally and metaphorically. Learn what sets them apart ...

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

TheServerSide.com

Spring, Quarkus or Jakarta EE? How to choose a Java framework
Choosing a Java framework is not about which one is best, it's about accepting their tradeoffs of stability, flexibility and ...
The case against vibe coding
Is vibe coding a bad idea for enterprises? AI can produce results faster than manual coding, but its benefits eventually unravel ...
An introduction to LLM tokenization
Users interact with LLMs through natural language prompts, but under the hood these AI models are based on LLM tokenization. ...

Search Data Center

Oracle, AWS partner for cloud database boost
Oracle Database@AWS combines the companies' tools to eliminate costly and complex data pipeline buildouts while adding to its ...
AI in the data center: Transforming operations and careers
AI is transforming data center jobs by enhancing efficiency, security and management. Strategic AI use enables staff to focus on ...
Top 12 Linux certifications
Choosing among Linux certifications can seem daunting, but there are differences to all options that can help you decide which is...

Close