agsandrew - Fotolia
Seagate's open source Cortx object store targets HAMR HDDs
Seagate launched new open source object storage software that is designed to exploit its highest capacity hard disk drives in supported Lyve Drive converged infrastructure racks.
Seagate Technology entered the object storage fray with new open source-based Cortx software designed to exploit its highest capacity hard disk drives in a converged infrastructure rack that targets enterprises with massive data sets.
The Cortx object storage software resulted from a multiyear collaborative effort between Seagate and partners in the European Union-funded Sage Exascale HPC project, including commercial technology providers, supercomputing centers and users with significant data and compute challenges.
Seagate launched an open source community on GitHub to develop and enhance the Cortx software. The vendor also designed a reference architecture to give customers an option to buy a hardened, Seagate-supported subset of the Cortx software to run on new Lyve Drive Rack converged infrastructure that will bundle certified server, storage and networking gear.
Lyve Drive Rack is due late this year from select Seagate channel partners. The first release will include options for servers from Dell and Supermicro, networking gear from Nvidia's Mellanox, and storage drive enclosures and HDDs from Seagate. Cortx will offer up to 1.5 PB of raw storage capacity with 18 TB HDDs, and customers can use the HDDs at variable capacity.
HAMR HDDs to ship this year
Seagate CEO Dave Mosley noted that his company's HAMR HDDs would start shipping later this year at 20 TB and ramp to 50 TB by 2026, and its MACH.2 multi-actuator technology would help to boost performance as the capacity scales. Mosley said hyperscale public clouds generally adopt the highest capacity drives as soon as they become available because they've built software stacks designed to exploit the latest cost-saving HDD advancements. He claimed enterprise customers that adopt Cortx could reap the same benefits the hyperscalers do.
Potential use cases for the Cortx object storage software include artificial intelligence, machine learning, analytics, IoT, video surveillance, high performance computing, backup and archiving. Seagate said early Cortx adopters include Toyota Motor Corp., the French Alternative Energies and Atomic Energy Commission, and the UK Atomic Energy Authority.
Seagate's Cortx developers had to confront a number of architectural challenges in building a system that could reliably and durably store petabytes and potentially exabytes of data with the mass-capacity HDDs. One of the key technologies they focused on is multi-layer erasure coding to enable users to protect data at the local and network levels at roughly the same capacity overhead as conventional erasure coding.
"If you take the same amount of protection and do half of it at one level and half of it at the other level, you get an additional nine of durability," said John Bent, managing technologist and Cortx community director at Seagate.
Importance of multi-tier erasure coding
Gary Grider, HPC division leader at Los Alamos National Laboratory (LANL), said multi-tier erasure coding is important for any organization that has tens to hundreds of thousands of HDDs that might be subject to a large-scale correlated failure. He said the lab sometimes has to stripe 1 PB files across thousands of HDDs to write them at a reasonable rate. If a power unit failure or some other catastrophic event takes down a rack or all the drives in a stripe, the lab could lose the giant file.
Gary GriderHPC division leader, Los Alamos National Laboratory
"Those kinds of events are hard to protect against with just simple erasure [coding]. But two tiers of erasure does a super good job of protecting you against the normal failure and these unusual but very painful events," Grider said. "And it does so in a pretty cheap way."
Grider said Los Alamos engineers devised a form of multi-tier erasure coding several years ago when they couldn't find an object store capable of operating at the simultaneous bandwidth and data protection levels they needed. But he said the lab would prefer not to maintain its own object storage code.
"Am I interested in Cortx? You bet. It's all about providing reliable, nearline storage at massive scale and high bandwidth at low cost," Grider said. "The plan to provide multi-tier erasure as a future feature makes Cortx a viable candidate for lab use."
The initial R1 release of Lyve Drive Rack will not support every feature of the open source Cortx project. For instance, multi-level erasure coding and NFS support are roadmap items for the supported product. Another important item in the works with the Seagate drive enclosures is Reman, which can reduce HDD rebuild times by fixing only the portion of a drive that fails rather than the whole device. A Lyve Drive Rack R2 release is due in mid-2021, according to Raj Das, a senior director of product line management at Seagate.
Cortx available on GitHub
Because Seagate made the Cortx software open source, the core components are freely available from GitHub under the Apache License 2.0 for use with any hardware, including HDDs from competitors Western Digital and Toshiba. But Bent said the software has optimizations that might work best with Seagate drives.
"Of course we want people to use our drives, [but] nothing will stop others from adding optimizations for their devices," Bent said.
Seagate launched the Cortx software and hardware-and-software reference architecture in response to customer feedback after the release of its shingled magnetic recording (SMR) HDDs, according to Bent. He said Seagate left it to customers to figure out how to modify their software to accommodate the high-capacity SMR HDDs and subsequently heard adoption would have been better if it had helped with the application changes. Project Cortx aims to provide customers with all the components they would need to deploy mass-capacity HDDs such as the upcoming new HAMR models.
"Most of the object stores out there are not optimized for a single hard drive," said Enrico Signoretti, a research analyst at GigaOm. "It's understandable that Seagate wanted to be sure that all the code is optimized to handle their hard drives the best. They want to sell their hard drives, so they want to be sure you are buying the entire stack from them."
Signoretti noted that Western Digital, Seagate's competitor, already tried to put together all the necessary components to build an object store, and those efforts ultimately didn't work. Western Digital sold its ActiveScale object storage to Quantum earlier this year.
Potential customer benefits
But Signoretti can envision potential benefits for large customers that want a single provider. He said they could get a special deal if they buy high-capacity HDDs as part of the object store package from Seagate, or they could request special features, since Seagate will manage the Cortx project.
Seagate claimed that more than 30 early adopter organizations and over 50 developers have already been working with Cortx through GitHub. But Signoretti expects Seagate will have a hard time building a large community for Cortx because open source alternatives Ceph and MinIO already have well-established communities.
From a user standpoint, Signoretti sees the Cortx object storage making the most sense for organizations seeking a multi-petabyte object store at a low cost per GB for backup, archiving and any other workloads that need bandwidth but don't require sustained high IOPS.
Bent said developers optimized Cortx for mass-capacity data sets and low-latency searches, with scale-out key-value store integrated into each Cortx server. He said Seagate is collaborating with Intel on the open source Distributed Asynchronous Object Storage (DAOS), designed for Intel's Optane memory, to address workloads requiring intensive, sustained IOPS.
"We envision that most data centers will need a mix of storage devices -- a relatively small amount of IOPS devices such as Optane and a relatively large amount of mass capacity devices such as Seagate's HAMR drives," Bent said. "We are working with Intel to develop integrated solutions such that customers can leverage both DAOS and Cortx without suffering the data siloing that so many of our customers identify as a major pain point in their data centers."
Object storage for different uses
Bent said Seagate doesn't view object storage as a "one-size-fits-all world." He said developers make design choices that result in differing feature sets to give customers a wide range of object storage choices. The Amazon S3 interface helps to prevent vendor lock-in, Bent said.
But Scott Sinclair, a senior analyst at Enterprise Strategy Group, said all customers worry about vendor lock-in once they commit to store petabytes of data in a single system. He said they're stuck with whatever technology they choose because it takes too long to move the data.
Sinclair said the fact that Cortx is open source could help to mitigate the risks for large companies, since they could use their own development teams or hire a third party to work on features if Seagate decides to abandon its object store in the future.