Definition

What is ZFS?

ZFS is a local file system and logical volume manager created by Sun Microsystems to direct and control the placement, storage and retrieval of data in enterprise-class computing systems.

The ZFS file system and volume manager is characterized by data integrity, high scalability and built-in storage features such as the following:

  • Replication, which is the process of making a replica or a copy of something.
  • Deduplication, which eliminates data redundancy and reduces storage overhead.
  • Compression, which reduces the number of bits needed to represent data.
  • Snapshot, which is a set of reference markers for data at a particular point in time.
  • Clones, which is an identical copy of something.
  • Data protection, which is the process of preventing data corruption or loss.
  • Encryption, which encodes data to prevent unauthorized access, with only the user in possession of the key to unlock it.

ZFS initially stood for Zettabyte File System, but the word zettabyte no longer holds any significance in the context of the file system. As a 128-bit file system, ZFS has the potential to scale to 256 quadrillion zettabytes.

How ZFS works

ZFS is designed to run on a single server, potentially with hundreds if not thousands of attached storage drives. ZFS pools the available storage into zpools, made up of one or more virtual devices, referred to as vdevs, and manages all the devices as a single entity. A user can add more storage drives to the pool when the file system needs additional capacity. ZFS is highly scalable and supports a large maximum file size.

A diagram of how ZFS organizes data.
ZFS integrates the file system and volume manager controlling the placement, storage and retrieval of data. A ZFS storage pool, or zpool, is created from one or more ZFS virtual devices.

ZFS stores at least two copies of metadata each time data is written to disk. The metadata includes information such as the disk sectors where the data is stored, the data block size and a checksum of the binary digits of a piece of data.

When a user requests access to a file, a checksum algorithm performs a calculation to verify that the retrieved data matches the original bits written to disk. If the checksum detects an inconsistency, it flags the bad data. In systems with a mirrored storage pool or the ZFS version of RAID, ZFS can retrieve the correct copy from the other drive and repair the damaged data copy.

ZFS is commonly referred to as a copy-on-write file system, although Oracle describes it as redirect-on-write. When ZFS writes data to disk, it doesn't overwrite data in place. ZFS writes a new block to a different spot on the disk and updates the metadata to point to the newly written block, while also retaining older versions of the data.

A true copy-on-write file system would make an exact replica of a data block in a separate location before overwriting the original block. Before overwriting the data, the system would need to read the block's previous value. A copy-on-write file system requires three input/output (I/O) operations -- read, modify and write -- for each data write. By contrast, a redirect-on-write system requires only one I/O operation, facilitating greater efficiency and higher performance.

ZFS is a popular choice for network-attached storage (NAS) systems, running Network File System on top of the file system, as well as in virtual server environments. Another common deployment scenario is layering a clustered file system, such as the General Parallel File System (GPFS) or Lustre, on top of ZFS to enable scaling to additional server nodes. OpenStack users can deploy ZFS as the underlying file system for Cinder block storage and Swift object storage.

Key features of ZFS

ZFS includes several features and capabilities of note:

Snapshots and clones

ZFS and OpenZFS can make point-in-time copies of the file system with great efficiency and speed because the system retains all copies of the data. Snapshots are immutable copies of the file system, while clones can be modified. ZFS snapshots and clones are integrated in boot environments with ZFS on Solaris, enabling users to roll back to a snapshot if anything goes wrong when they patch or update the system. Another potential ZFS benefit is as a recovery technique against ransomware.

RAID-Z

This approach allows the same data to be stored in multiple locations to enhance fault tolerance and improve performance. The system reconstructs the data on the lost drive using the information stored on the system's other drives. Similar to RAID 5, RAID-Z stripes parity information across each drive to permit a storage system to function even if one drive fails. However, with RAID-Z, the striped data is a full block, which is variable in size.

Although RAID-Z is typically compared to RAID 5, it performs some operations differently to address certain longstanding issues with traditional RAID. One issue that RAID-Z addresses is known as the write hole effect, where a system can't determine which data or parity blocks have been written to disk because of a power failure or catastrophic system interruption. Vendors of systems that use traditional RAID typically resolve the problem through the use of an uninterruptible power supply or dedicated hardware.

RAID-Z2 and RAID-Z3

RAID-Z2 supports the loss of two storage drives, similar to RAID 6, and RAID-Z3 can tolerate the loss of three storage devices. Users have the option to arrange drives in groups, as with conventional RAID. For instance, a system with two groups of six drives set up as RAID-Z3 could tolerate the loss of three drives in each group.

Compression

Inline data compression is a built-in feature in ZFS and OpenZFS to reduce the number of bits necessary to store data. ZFS and OpenZFS each support a number of compression algorithms. Users can enable or disable inline compression.

Deduplication

Inline data deduplication is a built-in feature in ZFS and OpenZFS that eliminates redundant data, making storage more efficient. ZFS and OpenZFS look at a block device's checksum to find the duplicate data. Users can enable or disable inline deduplication.

ZFS send/receive

ZFS and OpenZFS enable a snapshot of the file system to be sent to a different server node, letting a user replicate data to a separate system for purposes such as backup or data migration to cloud storage.

Security

ZFS and OpenZFS support delegated permissions and fine-grain access control lists to manage who can perform administrative tasks. Users have the option to set ZFS as read-only, so the data can't be changed. Oracle supports encryption in ZFS on Solaris.

ZFS advantages and limitations

ZFS' advantages include the following:

  • Ease of use. ZFS integrates the file system and volume manager so users don't have to obtain and learn separate tools and sets of commands.
  • Rich feature set. ZFS offers a rich feature set and data services at no cost, because it's built into the Oracle operating system (OS). Open source OpenZFS is freely available.
  • Expandable storage. Drives can be added to the storage pool to expand the file system. Traditional file systems require the disk partition to be resized to increase capacity, and users often need volume management products to help them.
  • Space efficiency and data integrity. Copy-on-write snapshots are free space and help capture the state of the file system at a specific point in time. ZFS uses checksums to verify data integrity, so that if an issue is identified, it can fix it using backup copies or parity data.

There are drawbacks, however:

  • Single server. ZFS is limited to running on a single server in contrast to distributed or parallel file systems, such as GPFS and Lustre. They can scale out to multiple servers.
  • Memory requirements. ZFS needs sufficient random access memory for caching and metadata management.
  • Complicated feature set. ZFS' rich feature set can at times make the software complicated to use and manage. Features such as the integrated ZFS checksum algorithms require additional processing power and can affect performance.
  • Licensing issues. In the Linux community, there are various opinions on licensing with respect to the redistribution of the ZFS code and binary kernel modules. For instance, Red Hat considers it problematic to distribute code protected under a common development and distribution license (CDDL) with code protected under a general public license (GPL). By contrast, Canonical, which distributes Ubuntu, has determined it's in compliance with the terms of the CDDL and GPL licenses.

History of ZFS

Sun engineers began development of ZFS in 2001 for the company's Unix-based Solaris OS. In 2005, Sun released the ZFS source code under a CDDL as part of the open source OpenSolaris OS. A community of developers from Sun and other vendors worked on enhancements to the code and ported ZFS to additional operating systems, including FreeBSD, Linux and macOS.

The OpenSolaris project ended after Oracle acquired Sun in 2010 and trademarked the term ZFS. Engineers at Oracle continued to enhance and add features to ZFS on Solaris. Oracle uses its proprietary ZFS code as the foundation for Oracle Solaris, the Oracle ZFS Storage Appliance and other Oracle technologies.

In 2013, a development community started a new open source project called OpenZFS. It was based on the ZFS source code in the final release of OpenSolaris. The open source community continues to add features, improvements and bug fixes to the OpenZFS code.

ZFS vs. OpenZFS

Oracle's ZFS and open source OpenZFS derive from the same ZFS source code. On separate tracks, Oracle and the open source community have added extensions and made significant performance improvements to ZFS and OpenZFS, respectively. The Oracle ZFS updates are proprietary and available only in Oracle technologies. Updates to the open source OpenZFS code are freely available.

The list of enhancements that Oracle has made to ZFS since its release include the following:

  • Encryption.
  • Support for the persistence of compressed data across OS reboots in the L2 adaptive replacement cache.
  • Bootable Extensible Firmware Interface labels that provide support for physical disks and virtual disk volumes greater than 2 TB in size.
  • Default user and group quotas.
  • Poll and file system monitoring.

Updates to OpenZFS include the following:

  • Additional compression algorithms.
  • Resumable send/receive, which allows a long-running ZFS send/receive operation to restart from the point of a system interruption.
  • Compressed send/receive, which allows the system to send compressed data from one ZFS pool to another without having to decompress and recompress the data when moving from the sending node to the destination.
  • Compressed ARC, which allows ZFS to keep compressed data in memory, enabling a larger working data set in cache.

OSes that support OpenZFS include macOS; FreeBSD; Illumos, which is based on OpenSolaris; and Linux variants such as Debian, Gentoo and Ubuntu. OpenZFS works on all Linux distributions, but only some commercial vendors provide it as part of their distributions. According to the OpenZFS website, several companies have commercial products built on OpenZFS projects, including Amazon Web Services, Datto, Delphix, Joyent, Nexenta and Spectra Logic.

ZFS use cases

ZFS and OpenZFS appeal to enterprises that need to manage large quantities of data and ensure data integrity. Users include scientific institutions, national laboratories, government agencies, financial firms, telecommunications, and media and entertainment companies.

ZFS is a popular solution for building scalable and resilient storage systems in enterprise environments. Key applications of ZFS include the following:

  • Supporting critical workloads, including data analytics, databases and virtualization.
  • Efficiently managing thousands of virtual machines (VMs) in secure cloud environments.
  • Speeding up data-intensive workload processing compared to other systems.
  • Supporting mission-critical databases with copy-on-write snapshots and data integrity features.
  • Providing scalability and performance enhancements for virtualization and VMs.
  • Facilitating data backup and recovery through incremental snapshots that lower storage overhead.
  • Using deduplication to optimize storage space and support large data files, such as media files, content libraries and archived data.
  • Supporting high-performance computing, such as scientific computing and data analytics.
  • Using snapshots, data set clones and other features to support software development and testing without impacting production systems.
  • When used with NAS, supporting file sharing and collaborative processing.

Oracle ZFS Storage Appliance

Whereas ZFS is a logical file system, Oracle's ZFS Storage Appliance incorporates ZFS capabilities into a hardware appliance to meet enterprise storage requirements. Designed specifically for Oracle workloads and Oracle Cloud, the appliance combines file, block and object storage in the same device. It's based on flash storage technology and supports petabytes of storage capacity.

Oracle ZFS Storage Appliance integrates with Oracle Database to prioritize I/O activities. It has an array of administrative features and communicates effectively with Oracle Cloud Infrastructure storage.

The future of ZFS

Oracle and OpenZFS are both likely to continue releasing improvements and new features. The following are examples of how ZFS is likely to change and be enhanced in the coming years:

  • Scalability will increase so that ZFS will be able to support massive data sets efficiently.
  • Performance improvements will enhance throughput while reducing latency.
  • Smart data tiering might let important data be stored on high-speed solid-state drives, while hard disk drives handle less critical data.
  • Improved data deduplication parameters will enhance storage space use.
  • Increased use in cloud storage environments.
  • Security enhancements will include more powerful encryption, snapshots with greater security, and protection for data in transit and at rest.
  • The use of artificial intelligence capabilities could improve ZFS performance.
  • ZFS compatibility with more OSes is likely.
  • Support for more storage environments is expected.

Enterprises and other organizations are struggling with the explosion of unstructured data. Find out how file systems such as OpenZFS provide a low-cost alternative.

This was last updated in July 2024

Continue Reading About What is ZFS?

Dig Deeper on Storage architecture and strategy