overprovisioning (SSD overprovisioning)
What is SSD overprovisioning?
Overprovisioning, in a storage context, is the inclusion of extra storage capacity in a solid-state drive. SSD overprovisioning can increase the endurance of a solid-state drive by distributing the total number of writes and erases across a larger population of NAND flash memory blocks and pages over time.
SSD overprovisioning also gives the flash controller additional buffer space for managing program/erase cycles (P/E cycles). The additional buffer space improves overall SSD performance. It's particularly useful for improving write performance because it increases the probability that a write operation will have immediate access to a pre-erased block. The reserved SSD capacity is not visible to the host as available storage space.
Types of overprovisioning
There are three types of SSD overprovisioning:
- Inherent. According to Seagate, every SSD has at least some overprovisioned capacity that is used for the following:
- the controller's firmware
- failed block replacements
- vendor-specific features
This capacity is inherent in the difference between the binary and decimal notation used to measure data amounts and capacities of SSDs.
For example, an SSD's capacity might be reported as 500 gigabytes (GB), but the actual capacity might be 500 gibibytes (GiB). Each GiB contains 73,741,824 bytes more than a GB, which translates to about 7.37% more capacity. That 7.37% is reserved and not part of the capacity available to the host computer. The host cannot see the overprovisioning. From the host's perspective, only 500 GB -- or 465.7 GiB -- of capacity is available for use.
Inherent overprovisioning is reserved for the overhead that comes with normal P/E cycles and operations, such as the following:
- garbage collection;
- wear leveling;
- TRIM command; and
- other background processes that maintain and optimize the drive.
Adequate overprovisioned capacity is essential to an SSD's long-term performance and reliability. Only the SSD controller can access this capacity.
- Vendor-configured. An SSD manufacturer might also set aside additional capacity to accommodate write-intensive workloads. The added capacity is anywhere from 7% to 28% -- and, sometimes, even more -- in addition to the inherent overprovisioning. This added capacity is also not available to the host.
- User-configured. In some cases, a user might overprovision a drive even further using the capacity that's available to the user. They would use a vendor-provided tool or create a separate partition that prevents the defined space from being used to do this. This is not the same as the inherent or vendor-configured overprovisioning, where the reserved capacity is available only to the storage controller and is not visible to the user or the host system. Only the remaining, unreserved capacity is available to the host. The user-configured overprovisioning comes out of the unreserved user capacity.
How does overprovisioning work?
There are no fixed rules that define how vendors should overprovision capacity or how that capacity should be reported to customers. Various vendors overprovision different amounts, use the overprovisioned space for different features and refer to capacities in different ways.
For example, one vendor might consider usable capacity to include compressed and deduplicated data, while another considers usable capacity to include only the available physical capacity, without any data reduction. However, vendors typically provide a drive's total available capacity -- usually, in decimal notation -- which is the unreserved capacity available to the host. Sometimes, they refer to this as raw capacity.
An example of overprovisioning
To better understand what overprovisioning means to a drive, consider an SSD with 800 GB of available capacity. In this case, the drive includes 976 GB of total physical capacity, which means it's been overprovisioned by 22%.
In other words, the drive has a 22% overprovisioning rate. The host can access the 800 GB of capacity, some of which it can overprovision. However, the host cannot access the remaining 176 GB; only the controller has access to that capacity.
The vendor might report the overprovisioning rate for the drive in this figure as only 15%, leaving off the inherent overprovisioning. The reported rate is sometimes referred to as the marketed overprovisioning. Because the reported rate is based only on the vendor-configured overprovisioning, a vendor might market its drive as not being overprovisioned, essentially reporting a 0% overprovisioning rate. However, the inherent overprovisioning still exists, which is why it's important to understand that true rate when purchasing a drive. The reported rate also doesn't include any user-configured overprovisioning.
To calculate a drive's overprovisioning rate, a buyer can use this formula.
Although this formula is straightforward, calculating a drive's overprovisioning rate can be tricky. First, the buyer needs to know the drive's actual physical capacity, which might not be easy to obtain. Vendors tend to confine their specifications to available capacity and say little about the actual physical capacity.
To confuse matters further, what vendors call physical capacity -- or raw capacity -- is, typically, the capacity available to the host, not the full capacity. Even if a buyer knows the actual physical capacity, they must then consider differences between decimal and binary notation to ensure they're properly accounting for the inherent overprovisioning.
Whether or not the formula is used, a buyer's goal is to determine the overprovisioning rate, which the vendor might be able to provide. Buyers should just be aware that this might reflect only the vendor-configured overprovisioning and might leave out the inherent overprovisioning.
How overprovisioning relates to performance and cost
In general, the higher the overprovisioning rate, the better a drive performs over the long term, especially for heavy workloads with lots of random writes. The drive is also more likely to meet its expected life span.
For example, a higher rate can indicate a reduced write amplification factor because internal SSD operations can be handled more efficiently. That can help both performance and endurance, especially as the drive starts to fill up. Of course, a higher rate also means there is less capacity for storing user data and free space for new data, resulting in a greater per-gigabyte cost for each SSD.
Data center overprovisioning
SSD overprovisioning should not be confused with the overprovisioning IT teams do to handle peak demand in their data centers. Data center overprovisioning is not used as much it was the past, and its days might be numbered.
Trends such as cloud computing and composable, disaggregated and consumption-based infrastructure services could one day make data center overprovisioning a thing of the past. An administrator can now plan around the organization's average usage and simply add cloud services to accommodate occasional spikes.
Learn about SSD response times and other important benchmarks.