jules - Fotolia

Tip

AWS Lambda functions have storage capacity, longevity limitations

AWS Lambda's high scalability can serve as a limitation, as data can be lost if too much strain is placed on the data store. These methods can work around Lambda memory limitations.

AWS Lambda is a valuable tool for provisioning a highly scalable logic tier for a variety of uses, including big data processing, building dynamic web back ends, microservices and mobile application infrastructure. But because AWS Lambda functions are short-lived compared to traditional architectures built in Amazon EC2, enterprise IT must carefully select the right data storage tier for Lambda architectures.

With Elastic Cloud Compute (EC2) architectures, enterprises pay for access to a processor that operates similar to a traditional virtual machine -- with internal memory and access to external file-based storage. With the Lambda architecture, the enterprise pays to run short-lived functions, which can create some unique storage challenges. Enterprise IT must define the use of storage in those AWS Lambda functions to avoid data loss.

AWS Lambda storage offers less time and memory

AWS Lambda functions have two main constraints compared to EC2: a shorter lifespan and more limited internal storage. AWS Lambda functions currently have a maximum duration of 300 seconds, whereas EC2 instances could theoretically run for years. If a job in Lambda hits its data capacity, the data is dropped. This is a challenge, particularly if the data storage tier can't process the output from Lambda functions in a timely manner.

And each function has a memory capacity of only 512 megabytes and 1,024 file descriptors. In contrast, Amazon EC2 Container Service can access tens or thousands of gigabytes of local instance storage and virtually limitless access to more permanent storage services, like Elastic Block Store, off the physical machine.

Scalability introduces new constraints

AWS Lambda was built from the ground up to be highly scalable, which means an enterprise can refactor a traditional batch job that might have taken hours to run on EC2 to run in a few minutes or less on Lambda functions running in parallel. In some cases, it also costs less.

For example, the department store Nordstrom rewrote its recommendation engine to run on Lambda, dropping its processing time from 20 minutes to a few seconds and cutting its project costs in half. But this bump in speed places a greater burden on the data store. For these use cases, developers must ensure that databases can scale data ingestion rates along with Lambda processing.

Developers should implement a rollback mechanism for when data cannot be processed in a timely manner. If the Lambda function takes too long to run, the data is lost -- a real problem if output from different Lambda functions depends on another. A rollback mechanism restarts the processing on another function. If data from separate AWS Lambda functions is dependent, a developer must set up buffering or sorting so that data can be recorded into the recipient data store in the appropriate order.

But using PUT/POST/COPY requests from Lambda in the receiving data store could drive up the price of the job. AWS charges $0.005 per 1,000 requests, which can add up on a data-intensive processing job. It often makes more sense to buffer multiple records or results in a Lambda function to reduce the number of PUT/POST/COPY requests. But greater buffering will drive up latency.

Secure the network versus the data store

AWS Lambda connects to back-end data stores through Amazon Virtual Private Cloud (VPC) and AWS Identity and Access Management (IAM), which protect data stores from cyberattacks. VPC connectivity allows greater flexibility, and IAM integration provides fine-grained control and more secure perimeter. VPC provides a single layer of protection at the network level, but IAM enables enterprises to control permissions to data with an additional layer of protection on the storage tier. AWS makes it easy to manage IAM credentials for a collection of dynamically provisioned Lambda containers using roles and permission groups.

VPC-hosted storage options include Amazon Relational Database Service, ElastiCache for a managed in-memory cache, Redshift for populating an enterprise data warehouse and nearly any EC2 web service for existing apps.

IAM-enabled data stores include DynamoDB for managing small -- 400 KB or less -- records, Simple Storage Service for highly-scalable object storage, Elasticsearch Service for search and analytics applications and Kinesis for high-speed event processing.

Lambda development is mostly focused on highly ephemeral data processing tasks. In the short term, applications built on EC2 and containers will remain the dominant architectures for longer-running apps because they're easier to integrate with longer-running data stores. Better tools and architecture could bring the durability of virtual machine apps to Lambda function-based infrastructures. However, Lambda is currently one tool among many in the enterprise IT storage arsenal.

Next Steps

What to expect with Amazon storage options

Take the good with the bad when using Lambda

Use Lambda to push your database to the next level

Dig Deeper on AWS cloud development