Alex - stock.adobe.com

A primer on Ceph storage technology, products

Ceph is a scalable distributed storage platform that can evolve as needed. Take a deep dive into its popularity, assorted offerings and how to get around challenges.

Organizations routinely deploy Ceph storage in their data centers or consider such a move. The platform is popular now because of its flexibility, scalability, reliability and availability.

Ceph is an open source platform that distributes data across multiple storage nodes. Organizations can deploy it on commodity hardware, which helps to reduce infrastructure costs.

Ceph is a complex system that can be difficult to implement and maintain, however. IT teams should fully understand how it works and carefully plan each deployment. In some cases, organizations turn to supported commercial Ceph products rather than trying to implement the free version on their own.

How Ceph storage works

Traditional storage architectures often rely on a centralized interface or other component that acts as an entry point to the underlying storage subsystems. This approach can limit scalability, decrease performance and create a single point of failure.

Ceph eliminates the need for a centralized entry point by distributing the data across multiple nodes that can interact directly with each other and with the client systems. Ceph can also deliver object, block and file storage services, all within the same cluster.

To facilitate these operations, Ceph employs the following software abstraction layers to create a complete software-defined storage platform:

  • Reliable Autonomic Distributed Object Store (RADOS). A scalable, intelligent storage system that distributes data throughout the Ceph cluster and serves as the platform's foundation.
  • RADOS Gateway (RGW). An object storage service that's compatible with both the S3 and Swift storage APIs.
  • RADOS Block Device (RBD). A block storage service that spreads virtual disks over objects to provide benefits similar to a SAN.
  • Ceph File System (CephFS). A file storage service that provides a Portable OS Interface-compliant distributed file system.
  • Librados. An API that enables organizations to create their own interfaces to the Ceph storage cluster rather than needing to go through the RGW, RBD or CephFS interfaces.
Ceph architectural components

By using these abstraction layers, Ceph can achieve a higher level of scalability and flexibility than is possible with traditional storage systems. Ceph clusters use these layers to create logical storage pools decoupled from the underlying physical storage.

How Ceph clusters work

A cluster is made up of multiple nodes running on commodity servers. Most of them are storage nodes that attach to HDDs, SSDs or a combination of both. The other nodes carry out management and storage operations, working in conjunction with the storage nodes.

A cluster's nodes run different types of daemons that perform these operations. The daemons can communicate directly with each other and with the clients accessing the data, eliminating any single point of failure. A Ceph cluster relies on four types of daemons:

  1. Ceph Monitor. This daemon maintains a master copy of the cluster maps, which track details about the cluster's current topology and status.
  2. Ceph Object Storage Daemon (OSD). A Ceph cluster typically includes numerous storage nodes that facilitate data access and management. Each storage node runs multiple instances of OSD, which, in turn, interface with the individual drives that house the data.
  3. Ceph Manager. This daemon works in conjunction with Monitor to track runtime metrics and cluster state information, such as storage utilization, system load and performance metrics.
  4. Ceph Metadata Server. The MDS daemon is used only for clusters that implement CephFS. The MDS daemon stores and manages file metadata, such as file names, timestamps and data paths.

Ceph uses the Controlled Replication Under Scalable Hashing (CRUSH) algorithm, in conjunction with the daemons, to track storage objects and compute information about their location without relying on a central lookup table. The algorithm determines which placement groups and OSD nodes should contain the individual storage objects. In this way, CRUSH can distribute data across OSD nodes on a massive scale and use intelligent data replication to provide resiliency.

Why the tech is hot now

Organizations turn to Ceph storage for a variety of reasons. They're increasingly adopting cloud-native architectures, which impact how they manage resources and deploy applications. Ceph offers the scalability and flexibility needed to apply these architectures to their environments, which helps eliminate the type of bottlenecks found in traditional storage systems.

Scalability has become increasingly important as data volumes grow. Much of the data is unstructured, such as text files, images and videos. Ceph has been optimized to handle data at petabyte or even exabyte scale. The need to manage such massive amounts of data has become even more important with the rapid growth of AI, machine learning and deep learning.

Software-defined storage, such as Ceph, provides the type of flexibility needed to accommodate evolving storage topologies and growing data volumes.

The movement toward software-defined storage has come to play an important role in managing data and storage resources within data centers, in edge environments and across geographic locations. Software-defined storage, such as Ceph, provides the type of flexibility needed to accommodate evolving storage topologies and growing data volumes, as well as to support modern workloads and technologies, such as infrastructure as code.

In addition, Ceph storage is free. Organizations can adapt the platform to their specific needs and minimize licensing and equipment costs.

Organizations implement Ceph storage to support a variety of uses. For example, they might turn to Ceph when deploying private clouds, container-based applications or disaggregated infrastructures. Some might deploy Ceph to support image and video repositories. Others use Ceph for data backup and archiving. Organizations might also use Ceph for their VMs or database systems.

Challenges to implementation

Ceph's inherent complexity is one of its most cited issues. Organizations that don't have the necessary expertise can run into problems with both setup and management. If organizations don't carefully plan and implement Ceph, client applications can encounter bottlenecks and latency issues. Fixing these problems is often time-consuming and costly.

Some users have reported a lack of broad community support, at least when compared to other systems. Others cite inconsistent or outdated documentation or a complete lack of documentation. These issues in a critical storage system can have a far-reaching impact on workloads that rely on the data.

Ceph also requires a comprehensive network to properly implement all its functionality and deliver the necessary performance. A high-performing cluster is possible only with the right network configuration. Setting up such a network can take a long time and a lot of resources.

A Ceph implementation typically requires a public network for client communication and monitoring, plus a storage cluster network for OSD operations. To get the network topology right, an organization might need additional expertise and resources to ensure its proper implementation.

Vendors and products

IT teams can implement Ceph storage in their data centers as needed without concern for licensing costs. It is also a complex system, however, which is why many organizations turn to commercial products to help simplify deployments and ongoing maintenance.

Multiple vendors now offer products based on Ceph technologies. Three vendors, in particular, have been at the forefront of the movement, each providing its own version of the Ceph platform. This unranked list is in alphabetical order:

  • Canonical Ceph. This platform has been designed for petabyte-scale deployments, while helping to ease the burden on organizations that don't have the in-house expertise needed to implement Ceph. The Canonical Ceph tooling enables IT teams to manage Ceph's entire deployment, configuration and operational lifecycle.
  • IBM Storage Ceph. IBM offers a vendor-supported Ceph distribution that is part of its portfolio of software-defined storage. IBM Storage Ceph enables IT teams to build data lakehouses for IBM watsonx.data and for next-generation AI workloads.
  • Red Hat Ceph Storage. Red Hat's version of Ceph offers a simplified storage platform engineered for data analytics, AI and machine learning, and other emerging workloads. IT teams can implement the platform on their choice of industry-standard hardware, while using technologies such as OpenShift and Kubernetes.

In addition, SoftIron sells a storage suite powered by Ceph. Mirantis offers a Ceph-based platform for cloud environments. Other vendors that provide Ceph-related products include SUSE, Rook, Aspen Systems and Virtunet Systems.

Robert Sheldon is a technical consultant and freelance technology writer. He has written numerous books, articles and training materials related to Windows, databases, business intelligence and other areas of technology.

Next Steps

Explore software-defined storage pros and cons

Dig Deeper on Storage architecture and strategy