Getty Images

Amazon S3 adds new capabilities for data lakes

The cloud object storage standard gains capabilities at re:Invent, including metadata generation and a managed table service.

AWS S3 object storage capabilities are expanding, with add-on services for data lakehouses and metadata alongside other new offerings from the vendor.

At AWS re:Invent 2024 this week, the hyperscaler's annual conference, the vendor released Amazon S3 Tables to general availability and Amazon S3 Metadata to preview.

Amazon S3 Tables enables S3 objects to become tabular data stored in Apache Iceberg for data lake creation. AWS fully manages the service. Amazon S3 Metadata automatically generates object metadata related to data governance and information. Both features require that customers opt in at either the time of bucket creation or afterward.

More enterprises are using AWS S3 object storage for new workloads such as generative AI or business-critical tasks, according to Mary Jander, an analyst at Futuriom. The S3 Tables and S3 Metadata services aim to further integrate object storage into critical workloads, despite object storage previously considered more of a repository for data, she said.

"They're all addressing the cost and complexity of managing data in S3," Jander said. "They had to take some action to keep the [enterprise] installed base happy."

In addition to S3 storage features, AWS released several other storage offerings at re:Invent, including new physical locations to securely upload customer data to the cloud, web access APIs for S3 buckets and automated tiering for OpenZFS.

Tables and metadata

Amazon S3 Tables is the first feature added to S3 object storage outside of improvements to feeds and speeds, according to Andy Warfield, vice president and distinguished engineer at AWS.

Customers are turning to S3 storage for data lakes and data warehouses after building up unstructured data within their AWS stores for more than a decade, Warfield said. Some customers have implemented Apache Iceberg table support manually but are then required to maintain the integration as they scale, which includes addressing issues such as compacting data and access controls.

Amazon S3 Tables automates those processes, improving performance and using a standardized API or interface for permissions, snapshots and other capabilities, he said.

"Today, you can build tables on top of S3 using Iceberg, but you're signing up for a ton of work," Warfield said. "We think this will be pretty broadly adopted by anyone working to integrate with Iceberg."

Apache Iceberg is a data project created by Netflix to solve its cloud data lake challenges. The project provides an open table format for large data sets similar to the Linux Foundation's Delta Lake open source project. Iceberg has seen increased adoption among enterprise customers in the past several years, with Adobe and Apple among its users, according to Netflix.

Warfield said additional capabilities are planned for the Tables service. Integration of Tables with AWS Glue Data Catalog is available in preview today, enabling customers to query and visualize data using AWS Analytics services such as Amazon Athena, Redshift, EMR and QuickSight.

Amazon S3 Metadata enables the automatic generation of object metadata in near real time for uses such as business analytics and inference applications, according to AWS.

At launch, capabilities include tracking the size and source of an object and integration with S3 Tables for additional querying. Customers can add their metadata using object tags for specific uses, and query that data using SQL.

"Now you can use any analytics interface to query any data within your bucket," Warfield said. "As unflashy as it is, the less our customers have to think about storage, the more successful we're being."

Both releases stress how important object storage has become for enterprise applications, according to Simon Robinson, an analyst at Informa TechTarget's Enterprise Strategy Group. Other vendors have also begun integrating data management tools closer to storage, such as Dell Technologies and its Data Lakehouse, he said.

"The storage vendors are creating these types of capabilities, and we're seeing a compression of the stack with more data management functions being bundled into the storage layer," Robinson said. "It's indicative of how data is becoming central to driving things such as [generative] AI."

Every time you have [a storage] transaction, you have a cost associated with that.
Mary JanderAnalyst, Futuriom

These new capabilities might expand enterprise use of AWS S3 object storage but could become another form of vendor lock-in, Jander said. Customers will have to pay for the metadata repository storage, and those with a multi-cloud strategy might not be able to use all these capabilities.

"Every time you have [a storage] transaction, you have a cost associated with that," she said. "[AWS is] very much into proprietary vendor lock-in."

Other releases

Ahead of re: Invent, AWS debuted a handful of other storage offerings, including AWS Data Transfer Terminal shops, a web Storage Browser for Amazon S3, and Amazon FSx Intelligent-Tiering.

AWS Data Transfer Terminals are new physical locations where AWS customers can directly upload data into the AWS cloud securely and with high throughput, according to the vendor. Available in Los Angeles and New York at launch, the shops can be rented by the hour for customers to upload data using their own media or Amazon Snowball appliances.

Storage Brower for Amazon S3 is an open source front end for web applications that enables authorized end users to interact with S3 object data, including uploading, downloading, deleting and more. Developers can also customize the look of their specific front end.

Amazon FSx Intelligent-Tiering adds automatic data tiering for FSx for OpenZFS, AWS' file storage service. This storage offering automatically shifts data among frequently accessed, infrequent and archival storage tiers.

Tim McCarthy is a news writer for TechTarget Editorial covering cloud and data storage.

Dig Deeper on Cloud storage