imageteam - Fotolia
Use a Linux file system journal for data integrity, performance
Understand the three different file system journaling modes for Linux, as well as which mount option provides the best levels of data protection and performance.
Linux performance issues are often related to the storage channel, even if you use external storage, such as a storage area network or distributed storage.
As a result, it's important to understand how the Linux I/O scheduler handles workloads on the storage channel -- and how a Linux file system journal can play a crucial role in efficiently dealing with I/O.
A journal keeps track of changes that are not yet committed to the file system. This ensures if something goes wrong before data is actually committed, you can recover the file system faster, while minimizing the chances of data loss or corruption.
Journaling file systems were beneficial when they were introduced in the mid-1990s, since the alternative consisted of a complete file system check. A server with a nonjournaling file system that goes down needs to check its entire file system administration when it comes up again -- a process that can take hours.
File system journal modes
Even if a file system journal is good for data integrity, it is not so good for performance. To mitigate that issue, current Linux file systems offer three different journaling modes that can be set to mount a journaling file system.
The data=journal mount option mode offers the highest level of security. This mode writes file system metadata and data to the journal before they are actually committed to disk. However, every write has to occur twice, which poses a heavy burden on the write performance of a system. For this reason, administrators don't often use this journaling mode. If you do use this mode, use a dedicated journal device that writes the journal to another hard disk to spread the workload.
The opposite of data=journal is the data=writeback option. This journaling mode does not offer full data protection -- it only protects the file system metadata. In the case of a server outage, there is a risk of losing data, but at least the server won't spend hours checking the integrity of the file system that had open files when the system crashed. Despite its shortcomings, this mode offers the best possible performance.
The third mode, data=ordered, offers a higher level of protection than data=writeback, but still doesn't offer the absolute guarantee that data=journal provides. This mode is the default journaling mode on most Linux distributions, because it offers reasonable security without paying a performance price that is too high.
Journals and performance
Using a file system journal offers better protection for your data for many servers. On some high-workload servers that suffer from nonoptimal write performance, you may want to reconsider journaling. There are a few choices: The administrator may select another journaling mode, use another file system or completely switch off the journal on a journaling file system.
In many situations where the journal is responsible for a higher write load -- such as through utilities like iotop -- the first step is to set journaling mode to data=writeback. Many servers can benefit from this option. In some cases, however, this option is not enough, and the administrator should use a file system without a journal. If that is the case, either select a file system that doesn't have journaling to start with, such as Ext2, or disable the journal on a file system that supports that option, such as Ext4.
Ext4 is a more recent journal file system, with advanced features and a faster kernel module. Even if some distributions still use Ext2 to format the /boot file system, there is no good reason to do so. To switch off the journal on Ext4, use the command tune2fs –O ^has_journal /dev/yourdevice.
Before you switch off the journal, make sure this doesn't get you into trouble later. In some environments, journaling is used to protect vital application data. The file system journal mount option will be set to data=journal, as this is the only journaling option that really secures your data. If the data option is not set to journal, the journal will not give you complete protection against corruption, but just the convenience of not having to wait for a file system integrity check after a reboot that follows a server crash. If that option applies to you, and you've seen that journaling slows down your server, switch it off.