Getty Images/iStockphoto

Set up a machine learning pipeline in this Kubeflow tutorial

For teams running machine learning workflows with Kubernetes, using Kubeflow can lead to faster, smoother deployments. Get started with this installation guide.

You don't have to use Kubernetes to power machine learning deployments. But if you do -- and there are many reasons why you might want to -- Kubeflow is the simplest and fastest way to get machine learning workloads up and running on Kubernetes.

Kubeflow is an open source tool that streamlines the deployment of machine learning workflows on top of Kubernetes. Kubeflow's main purpose is to simplify setting up environments for building, testing, training and operating machine learning models and applications for data science and MLOps teams.

It's possible to deploy machine learning tools such as TensorFlow and PyTorch on a Kubernetes cluster directly without using Kubeflow, but Kubeflow automates much of the process required to get these tools up and running. To decide whether it's the right choice for your machine learning projects, learn how Kubeflow works, when to use it and how to install it to deploy a machine learning pipeline.

The pros and cons of Kubernetes and Kubeflow for machine learning

Before deciding whether to use Kubeflow specifically, it's important to understand the pros and cons of running AI and machine learning workflows on Kubernetes in general.

Should you run machine learning models on Kubernetes?

As a platform for hosting machine learning workflows, Kubernetes offers several advantages.

The first is scalability. With Kubernetes, you can easily add or remove nodes from a cluster to modify the total resources available to that cluster. This is particularly beneficial for machine learning workloads, whose resource consumption requirements can fluctuate significantly. For example, you might want to scale your cluster up during model training, which usually requires a lot of resources, then scale back down to reduce infrastructure costs after training is done.

Machine learning project steps: Identify a business problem, lay out process and gather info from experts, choose and prepare data, choose and tune algorithm, and retune based on results.
Tools such as Kubeflow can speed up deployment of machine learning projects by standardizing and streamlining stages of the model development lifecycle.

Hosting machine learning workflows on Kubernetes also offers the advantage of providing containers access to bare-metal hardware. This is useful for accelerating the performance of your workloads using GPUs or other hardware that wouldn't be accessible on virtual infrastructure. Although you could access bare-metal infrastructure without using Kubernetes by running workloads in standalone containers, orchestrating containers with Kubernetes makes it easier to manage workloads at scale.

A major reason why you might not want to use Kubernetes to host machine learning workflows, however, is that it adds another layer of complexity to your software stack. For smaller workloads, a Kubernetes-based deployment might be overkill. In such situations, running workloads directly on VMs or bare-metal servers could make more sense.

When should you choose Kubeflow?

The chief advantage of using Kubeflow for machine learning is the tool's fast and simple deployment process. With just a few kubectl commands, you get a ready-to-use environment where you can start deploying machine learning workflows.

On the other hand, Kubeflow restricts you to the tools and frameworks it supports -- and might include some resources that you won't end up using. If you just need one or two specific machine learning tools, you might find it simpler to deploy them individually rather than with Kubeflow. But for anyone who needs a general-purpose machine learning environment on Kubernetes, it's hard to argue against using Kubeflow.

Kubeflow tutorial: Install and setup walkthrough

On most Kubernetes distributions, installing Kubeflow boils down to running just a few commands.

This tutorial demonstrates the process using K3s, a lightweight Kubernetes distribution that you can run on a laptop or PC, but you should be able to follow the same steps on any mainstream Kubernetes platform.

Step 1. Create a Kubernetes cluster

Start by creating a Kubernetes cluster if you don't already have one up and running.

To set up a cluster using K3s, first download K3s with the following command.

curl -sfL https://get.k3s.io | sh -

Next, run the command below to start a cluster.

sudo k3s server &

To check that everything's running as expected, run the following command.

sudo k3s kubectl get node

The output should resemble the following.

NAME           STATUS       ROLES            AGE    VERSION
chris-gazelle  Ready  control-plane,master   2m7s  v1.25.7+k3s1

Step 2. Install Kubeflow

With your cluster up and running, the next step is to install Kubeflow.

Use the following commands to do this on a local machine using K3s.

sudo -s
export PIPELINE_VERSION=1.8.5
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION"

If you're installing Kubeflow on a nonlocal Kubernetes cluster, the commands below will work in most cases.

export PIPELINE_VERSION=<kfp-version-between-0.2.0-and-0.3.0>
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/base/crds?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=$PIPELINE_VERSION"

Step 3. Verify that containers are running

Even after you install Kubeflow, it's not fully operational until all the containers that comprise it are running. Verify the status of your containers with the following command.

kubectl get pods -n kubeflow

If the containers aren't running successfully after several minutes, check out their logs to determine the cause.

Step 4. Start using Kubeflow

Kubeflow provides a web-based dashboard to create and deploy pipelines. To access that dashboard, first make sure port forwarding is correctly configured by running the command below.

kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

If you're running Kubeflow locally, you can access the dashboard by opening a web browser to the URL http://localhost/8080. If you installed Kubeflow to a remote machine, replace localhost with the IP address or server hostname where you're running Kubeflow.

Next Steps

Meeting the challenges of scaling AI with MLOps

The rise of automation and governance in MLOps

Tips for planning a machine learning architecture

Dig Deeper on Containers and virtualization