Warakorn - Fotolia

Tip

Storage vs. database: Moving data to the AWS cloud

David Linthicum reviews the benefits of database choices from Amazon Web Services and the advantages of leveraging object storage like Simple Storage Service as a storage category.

The need to figure where and how to store data has been a fundamental problem around systems development for the last 30 years. While users have shifted data storage and management to database management systems, the growth of cloud computing storage options does not necessarily make that an automatic choice.

In the world of AWS cloud, users have database choices such as:

  • Amazon RDS
  • Amazon DynamoDB
  • Amazon RedShift
  • Amazon SimpleDB
  • Multiple choices of relational Amazon Machine Images (Amazon Elastic Cloud Compute and Elastic Block Storage (EBS) that provide scale compute and storage, control over instances.)

What's more, there are three broad categories of storage in the world of AWS cloud, including:

First, let's address the advantages of leveraging object storage, such as S3, putting EBS and ephemeral aside for the sake of brevity. Object storage is more scalable than traditional file system storage, which is typically what users think about when comparing storage to databases for data persistence. Instead of organizing files in a directory hierarchy, object storage systems store files in a flat organization of containers (called "buckets" in Amazon S3) and use unique IDs (called "keys" in S3) to retrieve them.

The advantages are that object storage systems require less metadata than file systems to store and access files. They are typically more efficient because they reduce the overhead of managing file metadata, which is done by storing the metadata with the object. What this means to the developer is that object storage can be scaled out almost endlessly by just adding nodes.

Users typically leverage object storage for use cases when dealing with unstructured or archival data. This includes any storage of media (photos, sound, video, etc.), Web content, documents and even when users back up existing cloud or non-cloud systems. Indeed, many of the more common personal cloud storage systems leverage S3 as their storage system of choice. 

Additionally, object storage is better suited for nonrelational databases, including use with data applications such as Hadoop/MapReduce Analytics. It's also good for storing log files and sensor data.

Generally speaking, databases, including the ones offered by AWS, use different approaches and mechanisms for data storage, and thus are typically leveraged for different use cases. Examples would be storing transactional data, providing multi-user access and placing constraints on the data stored. 

Databases can retrieve and update information while hiding the internal physical storage system from the database user. Typically a data catalog and concurrence control services protect data from multi-user access issues.

What's more, databases provide recovery services that have the ability to roll back to a consistent state, as well as more sophisticated security services and support for data communication integrity services.  Databases also have features that promote data independence, meaning that it's much easier to break up the data into logical chunks. Finally, databases provide utility services such as database and performance monitoring, data extraction and importation.

When building a business system, unless it requires special analytical services, databases are typically the preferred go-to technology. As storage systems such as Amazon S3 become more sophisticated, they are certainly an option for specific use cases, as defined in this tip. As always, the path users choose should be directly linked to specific core requirements. 

Dig Deeper on AWS infrastructure