vege - Fotolia

5 Ceph storage questions answered and explained

What do you need to know about Ceph open source storage to decide if it's right for your organization? See how it compares to object storage alternatives, and optimize its use.

Ceph storage is one of the most popular object storage options available. This open source, highly scalable, unified storage comes with several advantages.

Ceph offers features commonly found in other enterprise storage products, but it's likely to be less expensive than traditional SAN. As an open source system, Ceph doesn't have the licensing fees of proprietary systems. It also isn't dependent on expensive, specialized hardware and can be installed on commodity hardware.

Among Ceph's other benefits are scalability and flexibility. Ceph provides multiple interfaces -- object, block and file -- for storage access, and you increase system capacity by adding more servers.

Some downsides to using Ceph include the need for a fast -- more expensive -- network. It's also not going to be free for every organization. If you're using it to store important data, you're likely going to use one of the two commercial versions available from Red Hat or SUSE with vendor support. Nevertheless, Ceph is a cheaper, viable alternative to a proprietary SAN.

What follows are answers to five questions about using Ceph. Find out how Ceph compares to Swift and GlusterFS, what the differences are between open source and commercial Ceph, how to optimize Ceph and how to use it with Windows machines.

Which object storage system is better: Ceph or Swift?

Ceph and Swift are object storage systems that distribute and replicate data across a cluster. They use the XFS file system or an alternative Linux file system. They were both built to scale, so users can easily add storage nodes.

Ceph offers features commonly found in other enterprise storage products, but it's likely to be less expensive than traditional SAN.

But when it comes to accessing data, Swift was developed with a cloud focus and uses a RESTful API. Applications can bypass the OS and directly access Swift. That's great in a cloud environment, but it makes accessing Swift storage anywhere else challenging.

Ceph is a more flexible object storage system, with four access methods: Amazon S3 RESTful API, CephFS, Rados Block Device and iSCSI gateway. Ceph and Swift also differ in the way clients access them. With Swift, clients must go through a Swift gateway, creating a single point of failure. Ceph, on the other hand, uses an object storage device that runs on each storage node. The other component used to access the object store runs on the client. Here, too, Ceph is more flexible.

Ceph data tends to be consistent across the cluster. Swift data is eventually consistent and may take time to synchronize across the cluster. Given that difference, Ceph does well in single-site environments, interacting with data types that need a high level of consistency, such as virtual machines and databases. Swift is better in large environments that work with huge amounts of data.

What's the difference between Ceph and GlusterFS?

GlusterFS and Ceph are open source storage systems that work well in cloud environments. They both can easily integrate new storage devices into existing storage infrastructure, use replication for high availability and run on commodity hardware. In both systems, metadata access is decentralized, ensuring no central point of failure.

While there are a lot of similarities, there are also key differences. GlusterFS is a Linux-based file system that can be easily integrated into a Linux environment but can't be easily integrated into a Windows environment.

Ceph, on the other hand, provides highly scalable object-, block- and file-based storage under a unified system. As with all object storage, applications write to storage using direct API access and bypassing the OS. Given that, Ceph storage integrates just as easily with Windows as it does with Linux. For this and other reasons, Ceph is the better choice for heterogeneous environments, where Linux and other OSes are used.

When it comes to speed in the Ceph vs. GlusterFS debate, neither system outperforms the other. And GlusterFS remains mostly associated with Red Hat, while Ceph has been widely adopted by the open source community.

Open source Ceph vs. commercial Ceph: How do they compare?

As open source software, users can integrate Ceph into any software-defined storage system for free as long as the source code remains available. Ceph offers a startup guide that walks users through the steps to making the software available on a Linux distribution and setting up an open source Ceph environment.

However, this is a complex process that requires some amount of expertise. That's where commercial Ceph distributions come into play; they can be easier to implement, and the vendors offer support.

The two commercial Ceph products available are Red Hat Ceph Storage and SUSE Enterprise Storage. There are technical differences between the two distributions. SUSE developed the Ceph iSCSI gateway, enabling users to access Ceph storage like any other storage product. Red Hat integrated Ceph-Ansible, a configuration management tool that's relatively easy to set up and configure.

What are the best ways to optimize Ceph performance?

SATA drives are sufficient for good Ceph performance. Ceph's Controlled Replication Under Scalable Hashing, or CRUSH, algorithm decides where to store data in the Ceph object store. It's designed to guarantee fast access to Ceph storage. However, Ceph requires a 10 Gb network for optimum speed, with 40 Gb being even better.

A few large machines configured with many disks will deliver the best performance. However, the journal disk must be separate from the object storage devices. Using an SSD-based journal will deliver the fastest speed, and using the B-tree file system, or Btrfs, will provide optimal Ceph performance.

How do you integrate Ceph and Windows machines?

There are two ways to integrate Ceph and Windows: Ceph Gateway and the iSCSI target in SUSE Enterprise Storage. Ceph Gateway provides applications with RESTful API access, but that's not the best way to provide access to an OS.

With the iSCSI target in SUSE Enterprise Storage, Ceph can be configured as an iSCSI-based SAN. This makes Ceph available for an OS, such as the Windows server OS, that runs iSCSI Initiator.

Next Steps

Compare Ceph alternatives for storage

Dig Deeper on Storage architecture and strategy