Fotolia
Primary Data DataSphere update improves scalability
Primary Data updates its DataSphere metadata engine to boost scalability and performance, support more hardware options, and connect to multiple Amazon S3-based cloud buckets.
Primary Data updated its data virtualization software to boost scalability and performance, support more storage and host operating systems, and connect to multiple Amazon S3 API-compatible cloud buckets.
The Primary Data DataSphere 1.2 metadata engine supports the automated management of more than a billion data objects and files. The prior version of the product could handle millions of files, according to Douglas Fallstrom, senior director of product management at Primary Data.
Primary Data also extended support to the latest versions of Dell EMC's Isilon and NetApp's ONTAP-based storage systems as well as Mac and Windows application servers, in addition to Linux-based hosts. DataSphere 1.2 can also migrate data to multiple Amazon S3-compatible cloud buckets, as opposed to the product's prior support for one bucket at a single cloud vendor, Fallstrom said.
Additional product enhancements include file-level cloning out of the data path to preserve performance, and nondisruptive high availability failover for rapid recovery. The Primary Data DataSphere 1.2 update also enables real-time performance metrics to show hot files through the user dashboard.
"When we came to market, we were focused on the test-and-dev environment, and that's where we've seen our initial success," said Primary Data CEO Lance Smith. "We're adding these capabilities and tuning the performance to allow us to get to other markets and applications. This [update] allows us to go into production environments that are mission critical. We can go into larger deployments that are multi-petabyte."
Smith said Primary Data expects to expand from its initial sweet spot of media and entertainment and oil and gas to verticals, such as finance, life sciences and exploratory data analysis.
How DataSphere works
Primary Data DataSphere 1.2 arrives less than six months after the Los Altos, Calif.-based startup first made the product generally available. The company describes DataSphere as a "metadata engine that automates the flow of data across enterprise infrastructure."
Lance SmithCEO, Primary Data
Primary Data DataSphere separates the control path from the data path to enable the software to manage files independently from the underlying storage-agnostic hardware, which can be file-, block- or object-based. The out-of-band metadata engine collects information about the data, such as location, size, and frequency of access.
When an application needs to access the data, the DataSphere server provides the physical location, similar to the way a Domain Name System (DNS) server looks up the Internet Protocol (IP) address of a server hosting a website. Primary Data DataSphere supports NFS v3 and NFS v4 and SMB 2.1 and SMB 3.x for file access.
"The benefit of the NFS or SMB file-based approach is that we can put data on any storage device that we're managing underneath the hood," Fallstrom said.
Fallstrom said a local VMware virtual machine (VM) running DataSphere Extended Services (DSX) software provides access to local block storage, and the block storage is used as part of the overall namespace. He said DataSphere supports Amazon S3-compatible object storage as an active archive tier.
DSX Data Mover software can migrate active or inactive data to the on-premises storage that best suits an application's needs, based on granular policies the customer sets. Fallstrom said the software nondisruptively enters the data path, makes a copy of the data and exits once the migration is complete. NFS 4.2 facilitates the live data migration, according to Fallstrom.
Smith said the Primary Data software significantly improves the performance of NAS arrays in particular, clustering them under a logical space and providing parallel access to the load-balanced storage elements.
"Primary Data is in the business of putting the right data in the right place at the right time," Smith said.
Moving data to the cloud
Primary Data's DSX Cloud Connector shifts files to Amazon S3-based cloud storage over an encrypted link, after deduplicating and compressing the data. For instance, a customer could move old inactive data from high-performance storage to slower, less expensive cold storage or to the public cloud to reduce costs.
Scott Sinclair, a senior analyst at Enterprise Strategy Group, said IT organizations historically threw unstructured data into a "giant pool of just stuff." Now that the public cloud is an option, they often don't know which data to move off-site and which data to keep on premises, he said.
"People want to move to the cloud, but it's complicated," Sinclair said. "Having something like DataSphere that can layer on top of your existing stuff and ease that movement to the cloud can make your life a lot easier."
Arun Taneja, founder, president and consulting analyst at Taneja Group, said enterprise organizations need visibility into the exabytes of data sitting in archaic NAS boxes. He said DataSphere 1.2 "simply makes this visibility more automated." He cited the importance of the product's built-in analytics.
"In the world of unstructured data made up of files and objects, the quality of the metadata engine and the ability to run analytics, using that metadata, is paramount," Taneja said. "Many scale-out NAS and object players are adding analytics in a desperate attempt to stay relevant, and the new players in this space are building that capability right into the architecture."
Taneja envisions Primary Data competing with NAS and object storage players, in addition to cooperating with them.
The Primary Data DataSphere software runs in a virtual machine or bare-metal hardware. Primary Data publishes a recommended hardware list. DataSphere pricing is subscription-based, starting at $100,000 for a single metadata engine, according to Primary Data.
Smith said a customer managing a petabyte of storage would save about $1 million. He claimed some customers managing multiple petabytes of data have saved millions of dollars by more fully using their existing resources, reducing product licenses, migrating cold data off expensive storage, and forgoing high-performance tier purchases.