Move over SMPs: Distributed shared memory systems step in
Researchers are building ersatz SMP machines out of commodity x86 boxes, plus specialized hardware and software.
Commodity x86 systems continue their inexorable march toward data center domination. Their latest victim: high-end symmetric multiprocessing (SMP) systems traditionally used in high-performance computing (HPC) because of their large memory footprints.
In the latest twist, increasingly powerful and dense Intel- and AMD-based systems are augmented with specialized hardware and software to create distributed shared memory systems for a small fraction of the price of traditional SMP hardware. Vendors attempting this include ScaleMP and RNA Networks, as well as former SMP luminaries like Hewlett-Packard.
Certainly, commodity x86 systems are not new to HPC, which is dominated by scale-out Linux clusters. But some applications require more main memory than a cluster can provide. In those cases, HPC shops have traditionally turned to such vendors as IBM, SGI and the former Sun Microsystems for large SMP systems with their copious amounts of main memory, said Rich Partridge, vice president and senior analyst for server research at Ideas International.
“There are some HPC apps that partition readily,” Partridge said, and can be broken up into smaller problems with message passing and Linux clusters. “Then there are problems that require a lot of memory and where data needs to be storage locally for performance,” he said.
Money talks, SMP walks
Small cash-strapped research organizations are pioneering the use of distributed shared memory technologies, Partridge added. In larger academic institutions, often there’s more than enough work to keep an SMP busy at all times to amortize its cost. However, smaller shops may have a harder time justifying the cost of a traditional SMP “because they don’t have enough cycles to be running it all the time,” Partridge said.
Case in point: “We ruled out the traditional SMP boxes because we didn’t have the money,” said a Linux administrator responsible for HPC at a U.S. genomics vendor.
The firm needed a system that could load 300 GB-plus datasets into main memory, but the cost of a single system with that memory capacity proved exorbitant. At the time, 256 GB of RAM was the last cost-effective option on the market, the administrator said.
Instead, the firm opted for software called vSMP Foundation from ScaleMP Inc. The vSMP Foundation aggregates storage and memory resources from several smaller systems across 40 Gb Infiniband links. The system was much cheaper than an SMP box while still providing decent performance, thanks to the 40 Gb Infiniband, the Linux admin said.
“Any time you go over PCI Express, it’s going to be slower. But 40 Gb is pretty close to native memory speeds,” the admin explained.
The administrator also considered options from Symmetric Computing, RNA Networks, and the now-defunct 3Leaf Systems, but ruled them out for a variety of reasons. Symmetric Computing had a 300 GB RAM limit and would have required a recompile; RNA only aggregates RAM, not disk; and 3Leaf Systems was built on proprietary hardware, the admin said.
For better or for worse, the presence of proprietary hardware didn’t detract Florida State University from 3Leaf Systems, said Jim Wilgenbusch, FSU director of HPC and a research associate in the department of scientific computing. Compared to ScaleMP, the proprietary ASIC that maintains cache coherency in the 3Leaf system provided “a little extra kick” in performance, he said. “There’s a lot of overhead involved in keeping the cache coherent over InfiniBand.”
Much like the genomics researcher, FSU didn’t have the budget for a traditional SMP. Instead, it used 3Leaf to build a 12-node cluster comprising 132 cores and 528 GB of memory that it can present as a single system or partition into smaller chunks for use by FSU’s various researchers.
“We use it very elastically,” interfacing 3Leaf’s application programming interface with the university’s job scheduler, Wilgenbusch said.
But while he thought the 3Leaf technology was good, its proprietary ASIC “was probably lethal,” he admitted. That specialty hardware requirement may have hastened the company’s demise earlier this year. “It was just another hardware component that you have to buy to build the box.”
Commodity cure-alls?
FSU “still uses the heck out of the system,” and plans to for several years to come, Wilgenbusch said. But when it comes time to replace it, it’s possible that commodity systems will have enough memory to make add-on technologies -- like 3Leaf and ScaleMP -- irrelevant.
In many respects, that’s already starting to happen. Beyond core counts, advances in processor design already make it possible to pack previously unheard of quantities of memory in an Intel Nehalem-EX or AMD Opteron system, and on-board memory extension technology can trick the systems into supporting even more.
For example, the HP DL980 comes with eight Intel Nehalem-EX processors, or 64 cores, and 2 TB of main memory using a custom memory switch ASIC, said Marc Hamilton, HP vice president of HPC sales.
“We sell a lot of them as ‘fat memory nodes’ next to what is otherwise a big two-socket [Intel] Westmere cluster,” he said. Customers of the DL980 in this configuration include Clemson University in Clemson, S.C.
The trend toward increasingly large commodity SMPs will continue, Hamilton predicted. The DL980 has a list price of $250,000.“If you go back a couple of years, a comparable system from SGI or Sun would have cost two to three million dollars,” he said. Nowadays, “you can get a teraflop of computing power in a single [rack unit].”
Let us know what you think about the story; email Alex Barrett, News Director at [email protected], or follow @aebarrett on twitter.