Gabi Moisa - Fotolia

Tip

GlusterFS vs. Ceph: Weighing the open source combatants

When considering open source storage software, GlusterFS and Ceph share that designation and little else. Knowing how each option works can help in the selection process.

The world of storage has changed a lot recently. Where a decade ago, the Fibre Channel SAN filer was the standard for enterprise storage, in current environments, influenced by infrastructure-as-a-service cloud, data storage needs to be more flexible.

GlusterFS and Ceph are two flexible storage systems that perform very well in cloud environments.

Before trying to understand what is similar and what is different in GlusterFS vs. Ceph, let's talk about some of the requirements for flexible storage in a cloud environment.

  • Scale-up and scale-out. In a cloud environment, it must be extremely easy to add more storage to servers, as well as to extend the pool of available storage. Ceph and GlusterFS both meet this requirement by making it easy to integrate new storage devices into an existing storage offering.
  • High availability. GlusterFS and Ceph use replication that writes data to different storage nodes simultaneously. By doing so, access times increase, as well as the availability of data. In Ceph, data is replicated to three different nodes by default, which ensures a backup will always be available.
  • Commodity hardware. GlusterFS and Ceph are developed on top of a Linux operating system (OS). Hence, the only hardware requirement is that these offerings have hardware that is capable of running Linux. Any commodity hardware runs the Linux OS, with the result that companies using these technologies can dramatically reduce their investment in hardware -- if they so choose. Reality, however, shows that many companies are investing in hardware specifically to run GlusterFS or Ceph for the simple fact that faster hardware leads to faster access to the storage.
  • Decentralized. In a cloud environment, there should never be a central point of failure. For storage, this means there should never be one central location where metadata is stored. GlusterFS and Ceph implement a solution where metadata access is decentralized, which decreases the availability and redundancy of storage access.

Now let's talk about the differences in the battle of GlusterFS vs. Ceph. As the name suggests, GlusterFS is a file system coming from the Linux world and respecting all of the Portable Operating System Interface standards. While you can integrate GlusterFS very easily into a Linux-oriented environment, integrating GlusterFS in a Windows environment is challenging, to say the least.

Ceph is a completely new approach to storage that was developed as an answer to Swift object storage. In object storage, applications don't write to a file system, but write to storage using direct API access in the storage. As a result, the application is capable of bypassing OS features and limitations. If an application has been developed to write to Ceph storage, it doesn't matter which OS is used. The result is that Ceph storage integrates as easily in a Windows world as it does in a Linux one.

API-based access to storage is not the only way applications can access Ceph. For optimal integration, there is also a Ceph block device, which can be used as a regular block device in a Linux environment, enabling you to use Ceph just the way you're used to accessing a regular Linux hard disk. Ceph also has CephFS, a Ceph file system that was written for Linux environments.

Recently, SUSE has added an iSCSI interface, which enables clients running an iSCSI client to access Ceph storage just like any other iSCSI target. All of these offerings make Ceph the better choice for heterogeneous environments, where more than just the Linux OS is used.

So Ceph is a more flexible offering that is easier to integrate in non-Linux environments. For many companies, that is enough to build their storage product on Ceph rather than GlusterFS. For environments running Linux only, this feature is not convincing enough, so let's talk about something else that's very important: speed.

In the contest of GlusterFS vs. Ceph, several tests have been performed to prove that either one of these storage products is faster than the other, with no distinct winner so far. The GlusterFS storage algorithm is faster, and because of the more hierarchical way in which GlusterFS organizes storage in bricks, this may lead to higher speeds in some scenarios, especially if non-optimized Ceph is used. Ceph, on the other end, offers sufficient customization features to make it just as fast as GlusterFS, with the result that the performance of both is not convincing enough to out-perform the other.

Reality, however, shows that the different methods available to access storage in Ceph are making it the more popular technology. This is borne out by the fact that more companies are considering Ceph technology than GlusterFS, and also by the fact that GlusterFS is still very strongly associated with Red Hat. For example, SUSE has no commercial implementation of GlusterFS, while Ceph has been largely adopted by the open source community, with different products available on the market. That's why, regarding the GlusterFS vs. Ceph battle, Ceph really does outperform GlusterFS.

Next Steps

Red Hat VP says Ceph, Gluster have different roles

Micron pins NVMe to Ceph storage

Ceph and Gluster get upgrades from Red Hat

Dig Deeper on Cloud storage