isoga - Fotolia

Tip

How to install ZFS on Linux

ZFS on Linux lets admins error correct in real time and use solid-state disks for data caching. With the command-line interface, they can install it for these benefits.

ZFS is a file system that provides a way to store and manage large volumes of data, but you must manually install it.

ZFS on Linux does more than file organization, so its terminology differs from standard disk-related vocabulary. The file system collects data in pools. Vdevs, or virtual devices, make up each pool and provide redundancy if a physical device fails. You can store these pools on a single storage disk -- which is not a good idea if you encounter file corruption or if the drive fails -- or many disks.

Benefits of ZFS

It is free to install ZFS on Linux, and it provides robust storage with features such as:

  • on-the-fly error correction;
  • disk-level, enterprise-strength encryption;
  • transactional writes -- writing all or none of the data to ensure integrity;
  • use of solid-state disks to cache data; and
  • use of high-performance software rather than proprietary RAID hardware.

ZFS on Linux offers significant advantages over more traditional file systems such as ext, the journaling file system and Btrfs. With ZFS, it is easy to create a crash-consistent point in time that you can easily back up. ZFS can also support massive file sizes of up to 16 exabytes if the hardware meets performance requirements.

How to install ZFS

To install ZFS on Linux, type sudo apt-get install zfsutils-linux -y into the command-line interface (CLI). This example shows how to create a new ZFS data volume that spans two disks, but other ZFS disk configurations are also available. This tutorial uses the zfs-utils setup package.

Initial ZFS install commands
ZFS1: To start the installation process, type sudo apt-get install zfsutils-linux -y into the command-line interface.

Next, create the vdev disk container. This example adds two 20 GB disks. To identify the disks, use the sudo fdisk -l command. In this case, the two disks are /dev/sdb and /dev/sdc.

Identifying ZFS pool discs
ZFS2: Use sudo fdisk -l to identify which disks you want to use for data storage.

Now you can create the mirror setup with sudo zpool create mypool mirror /dev/sdb /dev/sdc.

Depending on the disk reader's setup when you install ZFS, you might get an error that states "/dev/sdb does not contain an extensible firmware interface label but it may contain partition information in the master boot record."

To fix it, use the -f switch so the full command is sudo zpool create -f mypool mirror /dev/sdb /dev/sdc. If you are successful, you won't receive an output or error message.

To reduce root folder clutter, group the ZFS in a subfolder instead of in the root drive.

At this point, the system creates a pool. To check the pool's status, use the sudo zpool status command. The CLI will show the following status and the included volumes.

Command-line interface with pool status
ZFS3: A snapshot of the pool creation status during the ZFS installation

Your pools should automatically mount and be available within the system. The pools' default location is in a directory off the root folder with the pool name. For example, mypool will mount on the /mypool folder, and you can use the pool just like any other mount point.

If you're not sure of a pool location, use sudo zfs get all | grep mountpoint to show which mount point the program uses and identify the mount point needed to bring the pool online.

Mount point identification in ZFS
ZFS4: One command in ZFS helps you identify which programs the mount point uses.

With your data pools online, you can set most ZFS options via the CLI with sudo zfs. To set up more advanced ZFS functions, such as how to snapshot a read-only version of a file system, define storage pool thresholds or check data integrity with the checksum function, search the in-system ZFS resources with man zfs and reference the Ubuntu ZFS wiki.

If you're new to ZFS, double-check commands before you run them and ensure you understand how they move data within pools, address storage limits and sync data.

Dig Deeper on Data center ops, monitoring and management