freshidea - Fotolia
How to manage storage for Kubernetes challenges
The Kubernetes container orchestration platform has undergone changes to resolve storage challenges, including adding plugins for persistent data. However, some issues remain.
Kubernetes is an orchestration platform that manages containerized applications and services at scale. Using Kubernetes benefits organizations with diverse, changing workloads, and the platform is a powerful tool for orchestrating containerized applications.
Like containers, Kubernetes was designed to support highly dynamic, stateless operations, with containers being continuously created and destroyed to accommodate varying workloads. However, storage for Kubernetes can be tricky. Many operations require data to persist beyond the container's life span, which can seem at odds with the container's inherent nature.
Over the years, Kubernetes developers have incorporated volume plugins into the Kubernetes platform to address the need for persistent storage. A volume plugin is a module that extends the Kubernetes interface to support connectivity with a physical storage volume.
Volume plugins promised an effective way to persist data beyond the container life span, but they also came with a number of challenges. To address these challenges, Kubernetes introduced the FlexVolume plugin, but it too had issues. More recently, Kubernetes came out with a plugin that conforms to the new Container Storage Interface (CSI) standard, which promises to simplify the process of persisting data to various storage platforms.
The Kubernetes environment
Kubernetes is a portable and extendable platform that supports both automation and declarative configuration. The Kubernetes environment comprises a set of independent control processes for managing user workloads through the orchestration of compute, network and storage resources across the Kubernetes cluster.
Kubernetes benefits include the ability to support just about any application that can run in a container, making it possible to implement diverse and fluctuating workloads. Each container has its own file system and is isolated from the other containers and the host. As a result, the containers cannot see each other's processes. In addition, because the containers are decoupled from the underlying infrastructure, they're portable across cloud and OS distributions.
A Kubernetes cluster is made up of master and node systems. The master maintains the cluster and serves as the primary point for communicating with outside resources. The nodes are physical servers or virtual machines that run the containerized workloads.
Kubernetes clusters support the following object types for implementing containerized workloads:
- Pod: A logical structure for managing and running a set of related containers. All containers within a pod share storage and network resources. The pod is the smallest deployable computing unit in the Kubernetes cluster.
- Volume: A logical directory defined at the pod level that is accessible to the containers in that pod. The volume has the same life span as the pod, which means it can outlive any of the pod's containers.
- Service: A REST object that serves as an abstraction layer for accessing a logical set of pods in order to decouple front-end clients from back-end operations.
- Namespace: A set of virtual clusters that are based on the same physical cluster. Namespaces are geared toward environments that support many users across multiple teams or projects.
Kubernetes provides controllers that build upon these basic objects to deliver additional functionality. Kubernetes also includes an API that maps to each object type to provide a mechanism for working with the Kubernetes environment.
In addition, each Kubernetes cluster includes a control plane for managing the environment and automating operations. The control plane is a collection of processes that run on both the master and node systems. The master runs several processes for running controllers, managing pods and exposing the Kubernetes API. The nodes each run processes that maintain network rules, perform connection forwarding and ensure that the containers are running.
How storage for Kubernetes works
The pod provides the primary building block for deploying containerized workloads. It contains one or more containers, shared storage resources and a unique network IP address. A pod also includes options for controlling how the containers run. In most cases, a pod supports only a single instance of an application, with more pods added for additional instances.
Storage for Kubernetes happens at the pod level. A pod can be configured with a set of storage volumes that make it possible for the pod's containers to share storage and to persist data that can survive container restarts. Kubernetes offers a wide range of volume types, such as Azure Disk, CephFS, iSCSI, vSphere Volume, Portworx Volume and Amazon Elastic Block Store.
Kubernetes provides a special type of volume for persisting data beyond the life span of containers and pods. Referred to as the PersistentVolume (PV), it abstracts details about how storage is provided to or consumed by the pod's containers.
Kubernetes implements PVs as volume plugins that have lifecycles independent from the pod. Volume plugins extend the Kubernetes interface to facilitate connectivity with external storage. The plugins are in-tree modules built and compiled directly into the core Kubernetes binaries, making it possible to deliver storage to containers whenever it's required.
Because the plugins are built into the Kubernetes code, adding a new storage system means vendors must check their plugin code directly into the main code repository, which can introduce instability and security issues into the Kubernetes platform. This process also ties vendors to the Kubernetes release cycle, while forcing them to open their plugin source code to anyone who accesses the Kubernetes code.
To address these limitations, Kubernetes introduced the FlexVolume plugin, which provides an out-of-tree plugin interface that supports storage-related communications. In this way, storage vendors can develop drivers that are deployed to the Kubernetes environment, where they can be accessed by the FlexVolume plugin.
While this approach to storage for Kubernetes benefits some vendors, it also comes with several challenges. For example, the plugin can be difficult to deploy, and it requires access to the root file system on each cluster machine.
CSI volume plugin
More recently, Kubernetes has introduced a new CSI volume plugin, which promises to address these issues. It provides an out-of-tree solution that adheres to the CSI standard. CSI exposes storage systems to containerized workloads managed by container orchestration tools, such as Kubernetes. The new standard makes it possible for a vendor to develop a single driver that can support any CSI-compliant container orchestration solution.
The CSI plugin became generally available to Kubernetes with the release of version 1.13. As with the FlexVolume plugin, a vendor can develop a CSI-compliant driver that's deployed to the Kubernetes environment, while avoiding many of the challenges that come with the FlexVolume plugin. The vendor doesn't have to touch the Kubernetes code or worry about how Kubernetes is implemented. Once the driver has been installed, users can utilize the CSI volume plugin to carry out tasks such as attaching or mounting volumes to persist data.
While it was initially designed to support only stateless operations, Kubernetes benefits also include the ability to orchestrate containerized workloads. In the past, efforts to persist data past the life spans of the containers or pods often came with challenges. However, the introduction of the CSI plugin promises to make it easier for organizations to adopt the container model for their workloads, especially as more storage vendors offer CSI-compliant drivers. For many IT teams, CSI could prove just the impetus they need to transition to the world of Kubernetes and implement storage for Kubernetes.