Getty Images

Amazon adds new S3 features for data lakes, hybrid cloud

Amazon dropped new features for its object storage service that include an open source file client, new data purchasing capabilities and private data replication for Outposts.

A new set of capabilities for AWS S3 object storage seeks to improve hybrid and private cloud deployments and ease the use of S3 in customer data lakes.

AWS, the dominant hyperscaler controlling more than a third of the public cloud market, tied the release of these features to an AWS Pi Day event Tuesday, which also marked the 17th anniversary of S3.

The new features include the general availability of AWS Data Exchange for Amazon S3, which enables third parties to sell data sets without customers duplicating the data to another S3 bucket, as well as the alpha release of Mountpoint for Amazon S3, an open source file client.

These capabilities will enable customers to build out data lakes faster and at a lower cost, according to Kevin Miller, vice president and general manager of Amazon S3.

New capabilities for hybrid or private clouds, such as the local Amazon S3 Replication on Outposts, or simplified private cloud connectivity capabilities for Amazon Virtual Private Cloud (Amazon VPC) also enable greater availability and use of customer data, he said.

"We want customers to be able to use their data wherever they are and [with] whatever applications they're using," Miller said. "There's no value in collecting a lot of data if it can't be put to work effectively."

Cloud herding

Customers using the on-premises cloud infrastructure hardware AWS Outposts, private cloud deployments or multiple AWS Regions can now benefit from a handful of new capabilities for S3 object storage.

Amazon S3 on Outposts now supports direct S3 replication. Previously, customers had to move data they wanted to replicate from Outpost hardware into an AWS Region cloud. Now, S3 data replication rules can specify copying data directly to another Outpost or S3 bucket within the same Outpost.

Amazon VPC now offers private DNS endpoint connections for storage, enabling customers to route S3 requests to lower-cost endpoints. On-premises applications can use AWS PrivateLink, a service for connecting virtual private clouds, to access S3 data at lower costs for application workloads or data analysis.

Amazon S3 Multi-Region Access Points now support data replicated across multiple AWS accounts, enabling a single global endpoint for multi-region applications, and route S3 requests based on user policies.

These capabilities are basic necessities for hybrid cloud workloads, according to Naveen Chhabra, analyst at Forrester Research, and are frequently handled by third-party vendors such as MinIO.

AWS, he noted, has started embracing more hybrid cloud uses and capabilities in the past year to maintain its market dominance and ubiquity among application developers wanting to take advantage of cloud features.

"This is a long overdue admission of the fact that hybrid is a reality," Chhabra said. "Companies will run their own data centers with their own infrastructure and run their applications there."

Your slice of the data pie

New data lake capabilities should enable customers to better cash in on the data they're already generating and using, said Ray Lucchesi, president and founder of Silverton Consulting.

This is an ecosystem of data being rolled out in one fell swoop. [AWS is] providing a farmers market of data.
Ray LucchesiPresident and founder, Silverton Consulting

AWS Data Exchange for Amazon S3, the latest addition to the AWS Data Exchange, lets customers buy and sell third-party data sets without having to copy or manage those data sets to their own buckets. Data providers can now license in-place access to data within Amazon S3 buckets.

Companies already using the service include those that need access to meteorological data sets, stock images for AI model learning or B2B marketing data. Lucchesi said he only expects uses and demands for data to expand.

"This is an ecosystem of data being rolled out in one fell swoop," Lucchesi said. "They're providing a farmers market of data."

Mountpoint for Amazon S3, an alpha release available for Linux, lets customers access Amazon S3 buckets and object storage using file storage APIs.

AWS has supported other data lake ingestion tools in the past, such as the S3A adapter for Apache Hadoop, but nothing specific for object storage.

AWS designed Mountpoint for large-scale analytics applications. It allows users to map S3 buckets or prefixes into a file system namespace, browse buckets as though they were local files, and have high-throughput access to objects without provisioning or performance tuning.

Data lake customers could make their own connections from S3 object storage to file systems, AWS' Miller noted, but they often lagged in performance.

"They also found that there were certain bugs in those implementations," Miller said. "[Our customers] really wanted us to stand behind [a service] and say we support this software for accessing your data in S3 using a file API."

The hyperscaler suggests Mountpoint for building data lake applications that read larger objects without using other features such as locking or permissions. The alpha release doesn't support writes and will only support sequential writes to new objects in future updates.

Tim McCarthy is a journalist living on the North Shore of Massachusetts. He covers cloud and data storage news.

Dig Deeper on Cloud storage