DigitalGenetics - stock.adobe.co
Enterprise data storage startups to watch in 2021
Launching a data storage startup is always challenging, and the pandemic made it even tougher in 2020. Here are six startups that can change the way enterprises buy storage.
As with every aspect of life in 2020, data storage startups felt the effects of the pandemic. At least one new storage company -- Samsung Technologies-backed Stellus Technologies -- closed shop before ever bringing its key-value-based flash file system to market. Other startups soldiered on, securing venture funding with the goal of commercializing new storage products. Here are six storage startups we will be tracking in 2021.
Catalog
Specialty: DNA storage
CEO: Hyunjun Park
Funding: $21 million
Worth watching because: Catalog is creating a new form factor for dense data storage
Market hurdles: Products are not yet commercially available
What if you could afford an enterprise-ready parallel storage system that exploits the performance characteristics of human DNA -- one with vast storage capacity that endures for millennia, is searchable and potentially supplants legacy archival technologies?
That is the heady goal of Boston-based Catalog, a venture-funded spinoff of Massachusetts Institute of Technology. The data storage startup is testing an inkjet-style device, nicknamed "Shannon," that strings snippets of synthetic DNA together to form a dense storage media. The DNA can be assembled in various combinations, much like a Lego set.
DNA storage involves converting digital data into a DNA code that synthesizes these molecular strands. It has the potential to fill a need caused by rapid data growth, pegged by analyst firm IDC to reach 160 zettabytes by 2025.
Catalog claims it can bypass the tedious chemistry of existing conversion to encode data rapidly, at a cost comparable to existing storage media. Instead of 0s and 1s, data values are expressed in correlation to A, G, C and T -- the nucleotides that comprise the building blocks of human cells.
"We create a unique DNA molecule to represent each of those four locations and the associated value," Catalog CTO David Turek said.
Catalog is striving to encode 1 TB per day in DNA. Whether its product will mature to commercial availability is an open question. Catalog expects to offer DNA storage as a service in about two years. Customer trials are under way and include an oil-and-gas firm, a multimedia company and a research institute.
Investors have poured more than $19 million in Catalog, which includes a $10 million oversubscribed round in September. However, a handful of other companies are fueling research on DNA storage, including a partnership of Microsoft, Illumina, Twist Bioscience and Western Digital Corp.
Fungible
Specialty: Fungible computational storage system
CEO: Pradeep Sindhu
Funding: $311 million
Worth watching because: Fungible storage is designed on Data Processing Unit microprocessors
Market hurdles: Intel, Amazon are flexing muscles in computational storage
The AI era ushers in demand for storage that is exponentially faster. Chip newcomer Fungible designed a computational storage system aimed at cloud services providers. The objective is to compute data as close to storage as possible.
Fungible is one of a handful of data storage startups developing a new class of storage microprocessors as alternatives to the x86 CPU. Prominent venture firms have poured more than $310 million to help Fungible commercialize its storage technology.
Computational systems use offload cards to handle processing-intensive tasks. In Fungible's case, the card is an internally developed Data Processing Unit (DPU). Fungible's microprocessor integrates memory and processing to handle networking, security, storage and virtualization - a miniaturized topology of hyper-converged infrastructure.
The Fungible Storage Cluster block system includes the FS1600 scale-out storage server and associated out-of-band control plane. Fungible's TrueFabric NVMe-over-fabric stack fashions software-defined connection topologies that enable a low-latency endpoint to scale to thousands of servers.
"Fungible's vision is to totally re-architect the data center to be composable," not just the storage component, said Shankar Chandran, a senior VP of Samsung Catalyst, the venture arm of Samsung Technologies and an early Fungible investor.
The FS1600 supports 12 PCIe-connected hot-pluggable NVMe SSDs, with slots for NVDIMM-N and RDIMM memory cards alongside the onboard DPU. The Fungible Storage Cluster is available in 46 TB, 92 TB and 184 TB capacities.
Fungible bills its storage as a drop-in replacement for existing arrays, with managed cloud providers its initial target.
Fungible isn't the only chip maker drawing attention with these types of cards. Nvidia sells a BlueField-2 DPU based on Mellanox SmartNICs. Intel is also getting in on the action with its $2 billion acquisition of Israel-based Habana Labs, which developed a new type of silicon infrastructure designed to speed the efficiency of AI inferencing. Amazon Web Service recently added Habana's Gaudi accelerators to support AI inferencing with EC2 instances.
Nebulon
Specialty: Software-defined storage combines cloud-based AI
CEO: Siamak Nazari
Funding: $18 million
Worth watching because: Controller offload card makes storage easy to provision
Market hurdles: Limited to cloud providers at this time
Nebulon combines cloud analytics, composable infrastructure and server-based flash storage. The Nebulon product comprises two separate hardware components: an add-in server card and AI-powered control plane. Nebulon calls its PCIe card a Services Processing Unit (SPU); the Nebulon ON control plane performs analytics in Amazon Web Services or Google Compute Platform public clouds.
Like other new vendors listed here, Nebulon is making a play for specialized applications that depend on speed and performance. Nebulon adds the ingredient of ease of use. The storage startup claims its technology enables application owners to provision storage, without the intervention of a storage administrator.
Nebulon attacks the processing bottleneck with an offload device that send most processing functions to the cloud. It claims this reduces the overhead associated with array-based provisioning.
The Nebulon device is packaged in 2U 24-drive servers sold under OEM deals with HPE and Supermicro. The SPU replaces traditional RAID cards and host bus adapters found in servers. Each SPU has two 25 Gigabit Ethernet (GbE) ports that form the data plane for each application cluster, along with 1 GbE cloud connectivity.
The SPU connects flash storage residing in each node and emulates to the host the functions of a local storage controller. This local PCIe device only manages host I/O -- application, server and storage metrics are shuttled to the Nebulon ON cloud for prescriptive analytics. Enterprises can build a Nebulon cluster, known as nPod, that scales to 32 servers. Data services are configured in the Nebulon ON cloud SaaS portal, including mirrors, snapshots and volumes.
Nebulon's brain trust includes former HPE execs. CEO Siamak Nazari, COO Craig Nunes and CTO Sean Etaati and executive chairman Dave Scott were early employees at 3PAR, which HPE acquired in 2010 for $2.35 billion and turned into its flagship flash storage array.
Pliops
Specialty: Flash-optimized storage
CEO: Uri Beitler
Funding: $40 million
Worth watching because: Exploding data rates underscore demand for faster flash
Market hurdles: Turning proofs of concept into OEM partners, customers
Israel-based Pliops developed its own microprocessor to optimize flash-based database storage. The Pliops Storage Processor (PSP) key value engine runs a dedicated PCIe network to offload reads and writes from x86 CPUs.
Pliops is gearing up for a growth in flash adoption, which it predicts could account for nearly half of every bit stored by 2030. Although SSD performance is orders of magnitude higher than disk, traditional CPU technology remains a bottleneck.
Flash often sits idle as processors max out. Storage software takes shortcuts to execute management tasks, which invokes additional I/Os. Data reduction can help somewhat, but often imposes a performance penalty.
Those are problems Pliops set out to solve. Specifically, many storage engines use a B-tree architecture to organize data and its related data index. Although B-tree enables random reads/writes and updates in place, the hierarchical structure inherently increases storage overhead as a necessity of write amplification.
Pliops created a new data structure that optimizes storage management for databases. The Pliops algorithm handles garbage collection, indexing, merging, packing and sorting. PSP hardware accelerates the execution of tasks, using a fraction of the typical compute load and power consumption of x86 servers.
Part of Pliops' pitch is the option to use low-cost QLC NAND SSDs for mainstream workloads, using erasure coding to protect against failed drives. The device presents an NVMe block interface with thin provisioning and compression based on the Facebook-developed ZSTD open source algorithm. Commonly used databases write to Pliops via its library of NVMe-compatible key value APIs.
The company was founded by Uri Beitler, Moshe Twitto and Aryeh Mergi. Beitler and Twitto launched the company after leading the Samsung SSD Controller division in Israel. Pliops is the fourth startup for Mergi, who sold XtremIO to EMC for $430 million in 2012.
Storj Labs
Specialty: Decentralized cloud storage
CEO: Ben Golub
Funding: $35 million
Worth watching because: Companies want to reduce cloud egress fees
Market hurdles: Convincing IT users to storage data on anonymous endpoints
Anyone who has booked a ride with Uber will appreciate the business model of Storj Labs. Just as Uber doesn't own a fleet of cabs, Storj Labs doesn't own any of the equipment it uses for back-end object storage. Instead, the crypto-financed, Boston-based startup relies on a peer-to-peer network of node operators -- mostly IT organizations, but also private individuals - who sell spare bandwidth and disk capacity to end users via the Storj Tardigrade object storage service.
Storj is decentralized cloud storage; enterprises lease capacity on thousands of server nodes in more than 80 countries. Enterprises pay using Storj tokens, and node operators get a cut based on how much of their storage capacity is consumed.
Why is this potentially interesting for enterprises? Two words: egress fees. Storj claims its distributed computing network costs a fraction of what major public cloud providers charge for accessing data locally. Tardigrade subscribers pay about 1 cent per GB, per month for storage per and 45 cents per GB, per month for bandwidth. Storj said about 60 cents of every dollar gets shelled out to node operators.
Storj encrypts a file upon upload and shards data into 80 discrete segments with Reed Solomon erasure coding. An entire data file never resides on any single server. Tardigrade technology ensures data shards rotate between nodes on a continuing basis. Files can be reassembled with as few as 30 shards. Storj monitors all drives and automatically removes a drive before it fails.
Despite the potential cost savings, convincing enterprises to trust a network of anonymous endpoints is a big challenge for Storj. Fellow startup Sia offers similar blockchain-based storage based on its peer computing service and software development kit. Object vendor Filebase built its S3-compatible object storage on the Sia platform
Trilio
Specialty: TrilioVault native container backup
CEO: David Safaii
Funding: $15 million
Worth watching because: Trilio-IBM CloudPak distribution provides momentum
Market hurdles: Leading backup software vendors may have a head start
With container adoption on the rise, safeguarding data in those virtual instances takes on greater significance in the AI era. That's the need that Trilio aims to fill.
The startup made its TrilioVault cloud-native data protection platform generally available in 2020 for Kubernetes and IBM-owned Red Hat OpenShift and Red Hat Virtualization. IBM has integrated TriloVault in IBM CloudPak for container data protection in hybrid clouds.
As containerized applications move from the development bench to production, the requirements of protecting data remain the same. The Trilio launch comes at an opportune moment, as the pandemic has forced IT teams to collaborate remotely on application development.
Trilio data protection works without agents or custom scripts. TrilioVault is packaged as a Kubernetes custom resource definition and runs natively as a Kubernetes API extension. The backup platform allows Kubernetes to perform snapshots and disaster recovery of application data and associated metadata residing in containers.
Customers in automotive, financial services, defense and telecom deploy Trilio for incremental point-in-time snapshots against any storage target. TrilioVault feature upgrades introduced in November include protection of applications that share the same namespace, application-consistent database backups and data migration to multiple clouds.
In addition to TrilioVault for Kubernetes, TrilioVault versions are available for Red Hat OpenShift and Red Hat Virtualization, which both are built on top of a Kubernetes framework. Red Hat owner IBM has integrated TriloVault in IBM CloudPak for container data protection in hybrid clouds.
Trilio has plenty of company in container backup. Its long-term future may involve an acquisition. That was the path taken by competitors Kasten and Portworx -- storage startups that made our previous watch lists. Virtualized backup giant Veeam bought Kasten to flesh out its Kubernetes-based backup, while Portworx got scooped up by Pure Storage. Asigra, Catalogic, Cohesity, Druva and Zerto are among the data protection vendors that engineered their software for agentless Kubernetes backup.