IBM
Optimize mainframe processor performance with vertical polarization
To increase mainframe processor capacity and speed, IBM turned to vertical polarization. Learn how to take advantage of it through a feature called HiperDispatch.
To prepare for the inevitable repeal of Moore's Law, IBM is looking for other means to increase processor capacity. Its strategies include instruction pipelining, hyper-threading and processor cache. These new techniques sometimes make mainframe performance variable, if not erratic, depending on what's going on in the system. One way to get top performance out of a mainframe is to optimize processor vertical polarization.
The mainframe processor back story
Recent mainframe processor models depend heavily on cache to feed the instruction pipeline. IBM's latest mainframe processor, z13, comes with four levels of cache with varying sizes and distance from the central processor (CP). Each core has Level 1 (L1) and L2 caches. L3 cache is shared with every core on the chip, while L4 may be accessed by any processor in the drawer. Data must be in L1 cache for the processor to use it.
Mainframes are known for running disparate workloads at high rates of efficiency. This means a lot of context switching between the hardware states of, say, an online transaction and a batch job grinding away at a large file.
For example, as a production logical partition (LPAR) runs, it builds its cache working set, which leads to better performance as the mainframe processor spends less time retrieving data from memory or lower cache. Then, at some point, the production LPAR loses control of the CP to a test system. The development LPAR goes through the same process of gathering its favorite data into cache, gaining efficiency only to give control back to the production system. On fully loaded processors, this thrashing leads to slower performance and higher CPU consumption.
To remedy the situation, a mainframe can exploit the idea of vertical polarization, which aims to keep LPARs on the same physical processor and to spend less time loading and purging cache. To increase vertical polarization, IBM introduced a feature called HiperDispatch, along with a couple of ways to measure it.
When HiperDispatch is enabled, the operating system dispatcher collaborates with the mainframe hypervisor, Processor Resource/System Manager (PR/SM). The two work together to ensure any LPAR runs consistently on the same group of physical processors, thus preserving cache contents, minimizing cache misses and increasing processing efficiency.
The first measurement of cache efficiency is relative nest intensity (RNI). IBM has a complicated formula for RNI that differs for each mainframe processor model, based on the time it takes to fetch data from the varying levels of cache and memory. A lower RNI indicates efficient cache usage, which means the CPU spends less time waiting for instructions and data.
The second measurement is clocks per instruction (CPI). This metric counts the number of clock cycles it takes to execute an instruction. The further the processor reaches into cache or memory, the more cycles an instruction will take -- the lower the CPI, the better. It's also an object lesson for how the same program with the same data can consume different amounts of CPU based on competition for cache.
Calculate vertical CPs
CP verticality has three designations:
- Vertical High (VH) -- Physical processor effectively dedicated to an LPAR;
- Vertical Medium (VM) -- Physical processor that may be shared between LPARs; and
- Vertical Low (VL) -- Physical processors with no requirements that will be parked until needed.
In general, PR/SM assigns vertical polarity based on an LPAR's weight and number of logical and physical processors. LPAR weight defines a partition's relative importance and CPU share when a processor becomes busy.
When HiperDispatch is enabled, determine the number of CPs an LPAR can use by dividing the individual LPAR weight by the total LPAR weight times the number of physical processors. In practice, with z/OS and PR/SM collaborating, each z/OS would do its best to dispatch work on its one VH processor. If the VH CP becomes too busy, it might send work to the medium high processor.
Since PR/SM understands the underlying cache structure for the processor, it assigns CPs to an LPAR on the same or nearby chips to shorten the time needed to retrieve data. Given PR/SM's complex and opaque decision making, a systems programmer's best bet is to download IBM's LPAR Design Tool. This tool, in the form of a complex Excel spreadsheet, allows users to plan their LPAR configurations to optimize their hardware.