Deploying Big Files with AWS Lambda and EFS Made Easy
Enhance Lambda Capabilities: Simple Steps to Deploy Big Deep Learning Models with EFS
Background
In some cases, we want to deploy our trained deep learning models or pre-trained models from platforms like Hugging Face to AWS Lambda for serverless inference.
While the official service for such tasks is AWS Sagemaker, it can sometimes be overly complex for simple deployment needs. Although Sagemaker offers benefits like model management and MLOps, there are scenarios where a simpler solution is preferred.
Deploying large models with just Lambda presents challenges due to the service's size limitations—50 MB for direct uploads (zipped) and 250 MB (unzipped). Using Lambda Docker with ECR can support up to 10 GB, but storing all files in memory can lead to slower cold starts and increased costs.
To achieve efficient deployment, I recommend using Lambda with EFS as the file system.
How it Works
EFS is a file system that Lambda can access if it is mounted properly. To achieve this, both the EFS resource and Lambda need to be within a VPC.
According to AWS best practices, only resources that need to be accessible from outside the VPC should be placed in public subnets. Typically, this includes NAT gateway, which adds an extra layer of network security by ensuring critical resources are only accessible from within the VPC.
Consequently, EFS mounts should be placed inside private subnets. Since Lambda functions do not have public IPs, they should also be placed in private subnets.
Here is an overview of the steps.
Create a Lambda function in a VPC with a private subnet.
Create an EFS in the same VPC as the Lambda function, also in a private subnet.
Create an EFS Mount Target in the Availability Zones (AZs) where the Lambda function will be deployed.
Create a Security Group to enable the Lambda function to access the EFS.
Mount the EFS from the Lambda settings.
To insert the file, I usually create a temporary EC2 instance and mount the EFS on that EC2 instance so that I can transfer my files through the EC2 instance to the EFS.
As a prerequisite, I assume that we already have a VPC with a NAT Gateway and internet access set up from the private subnet.
Sounds Hard Enough... How to build it?
There are many ways to build this infrastructure, and I generally recommend using some form of Infrastructure as Code (IaaC) such as Terraform, CloudFormation, or AWS SAM.
Since we are dealing with Lambda, I will be using AWS SAM in this tutorial. If you already use Terraform in your stack, I suggest managing EFS and EC2 with Terraform and Lambda with AWS SAM.
However, for simplicity, I will be using 100% AWS SAM for this tutorial. The easiest way for us to jump-start this is by using sam init
.
sam init
After that, you should see the prompt below. Choose 1 - AWS Quick Start Templates
.
Which template source would you like to use?
1 - AWS Quick Start Templates
2 - Custom Template Location
Choice:
There are a lot of templates. Choose 14 - Lambda EFS example
1
Choose an AWS Quick Start application template
1 - Hello World Example
2 - Data processing
3 - Hello World Example with Powertools for AWS Lambda
4 - Multi-step workflow
5 - Scheduled task
6 - Standalone function
7 - Serverless API
8 - Infrastructure event management
9 - Lambda Response Streaming
10 - Serverless Connector Hello World Example
11 - Multi-step workflow with Connectors
12 - GraphQLApi Hello World Example
13 - Full Stack
14 - Lambda EFS example
15 - DynamoDB Example
16 - Machine Learning
Pick your Python version.
Which runtime would you like to use?
1 - python3.9
2 - python3.8
3 - python3.12
4 - python3.11
5 - python3.10
By this point, we should have all we need to deploy Lambda and EFS to the cloud.
The complete generated template can be found here.
SAM Init, Lambda EFS example. Generated Template
Okay, Problem solved?
Not exactly. While using sam init
can give us a jump start for our development, there are two problems when using it for Lambda with EFS.
It doesn't provide a way to put data on the EFS.
The bigger problem is that we cannot exactly use it locally. While there are issues on GitHub discussing this, there is currently no way to test it locally.
In the next sections, we will try to fix the above problems.
Improve SAM Template & Add EC2
When adding an EC2 instance, it's not necessary to rewrite the entire template. The template generated by sam init
can be hard to read and understand, so I've rewritten it to make it clear what inputs are needed and what resources are required to create resources like EFS, Lambda, and EC2.
Also, to fix the second problem, we need our Lambda to be image-based instead of a zip file.
First, let's start with the parameters:
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Build Lambda with EFS!
Parameters:
VpcId:
Type: String
Default: your-vpc-id
VpcCidr:
Type: String
Default: 12.1.0.0/16
PublicSubnetId:
Type: String
Default: your-vpc-id's public subnet
PrivateSubnetId:
Type: String
Default: your-vpc-id's private subnet
Resources:
# We Fill this Later
As you can see, we need the VPC ID, VPC CIDR, one public subnet ID, and one private subnet ID. Using these inputs, we can create our resources.
Lambda Resources
Next, we define the Lambda resources:
Resources:
#
# Lambda
#
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: [lambda.amazonaws.com]
Action: ['sts:AssumeRole']
Policies:
- PolicyName: root
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
- ec2:CreateNetworkInterface
- ec2:DescribeNetworkInterfaces
- ec2:DeleteNetworkInterface
Resource: '*'
- Effect: Allow
Action:
- elasticfilesystem:ClientMount
- elasticfilesystem:ClientWrite
Resource: '*'
LambdaFunction:
Type: AWS::Serverless::Function
Properties:
PackageType: Image
FunctionName: my-function-name
Role: !GetAtt LambdaExecutionRole.Arn
# Compute
Timeout: 600
MemorySize: 4096
Architectures:
- x86_64
# Network
FileSystemConfigs:
- Arn: !GetAtt EFSAccessPoint.Arn # TODO: Implement
LocalMountPath: /mnt/files
VpcConfig:
SecurityGroupIds:
- !Ref EFSAccessSecurityGroup # TODO: Implement
SubnetIds:
- !Ref PrivateSubnetId
Metadata:
Dockerfile: Dockerfile
DockerContext: ./src
DockerTag: test
To create the Lambda function, we need two resources: the Lambda itself and an IAM Role for executing it (accessing EFS, putting logs, etc.).
EFS Resources
Next, we create our EFS resources:
#
# EFS
#
EFSAccessSecurityGroup:
Type: 'AWS::EC2::SecurityGroup'
Properties:
GroupDescription: Security Group for Lambda and EFS communication
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 2049 # NFS port used by EFS
ToPort: 2049
CidrIp: !Ref VpcCidr
EFSFileSystem:
Type: AWS::EFS::FileSystem
Properties:
Encrypted: false
EFSMountTarget:
Type: AWS::EFS::MountTarget
Properties:
FileSystemId: !Ref EFSFileSystem
SubnetId: !Ref PrivateSubnetId
SecurityGroups:
- !Ref EFSAccessSecurityGroup
EFSAccessPoint:
Type: AWS::EFS::AccessPoint
Properties:
FileSystemId: !Ref EFSFileSystem
PosixUser:
Uid: "1000"
Gid: "1000"
RootDirectory:
CreationInfo:
OwnerGid: "1000"
OwnerUid: "1000"
Permissions: "0777"
The above defines the bare minimum so that our EFS can work. We need an EFS File System, a mount target, and an access point. Additionally, we need a Security Group to allow access between EFS and Lambda.
EC2 Resources
Finally, we add the EC2 instance to interact with EFS:
#
# EC2 Instance
#
EC2InstanceIAMRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Action:
- 'sts:AssumeRole'
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonElasticFileSystemClientFullAccess
- arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess
EC2InstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Roles:
- !Ref EC2InstanceIAMRole
EC2Instance:
Type: AWS::EC2::Instance
Properties:
ImageId: ami-07c589821f2b353aa # ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20231207
InstanceType: t2.micro
SubnetId: !Ref PublicSubnetId
SecurityGroupIds:
- !Ref LambdaSecurityGroup
IamInstanceProfile: !Ref EC2InstanceProfile
(AMI ID might need to be updated based on the region and availability)
This allows the EC2 instance to access EFS. Additionally, you will need to mount the EFS to the EC2 instance before accessing it. You can find the mounting guide in the AWS official documentation.
The final template can be confirmed at my Github repository.
Bigger Problem: Local Invocation.
While the inability to test locally is not the end of the world, it is certainly a major inconvenience. Without local invocation, you need to deploy every time you want to test your code. This can take hours of your time and has certainly taken hours or even days of mine.
Worry no more, I have found a solution by reverse-engineering sam-cli
.
The way it works is that when you press sam local invoke
, the sam-cli
creates an image, makes a container of it, opens an endpoint, calls it, and then removes the container. So to solve this, we just need to create a container with a Docker volume attached to it, the same directory that we attach our EFS to. In this case, we define it as:
LocalMountPath: /mnt/files
To make this happen, we build our SAM project:
sam build --cached
Using the built image, we run the container with the volume attached to it:
docker run \
--rm \
-p 8000:8080 \
--platform linux/amd64 \
-v $$(pwd)/efs:/mnt/efs \
-e AWS_LAMBDA_FUNCTION_MEMORY_SIZE=8192 \
-e AWS_LAMBDA_FUNCTION_TIMEOUT=600 \
-e AWS_LAMBDA_FUNCTION_NAME=my-function-name \
-e AWS_ACCESS_KEY_ID=dummy \
-e AWS_SECRET_ACCESS_KEY=dummy \
lambdafunction:test
After that, you can invoke your function with the following command:
curl \
-X POST \
http://localhost:8000/2015-03-31/functions/function/invocations \
-d '{"test":"test"}'
This approach allows you to test your Lambda function locally, saving you significant time and effort.
Closing
By improving our SAM template and adding the necessary configurations, we've streamlined the deployment process for Lambda and EFS resources. We've also implemented a solution for local invocation, significantly reducing development and testing time. Now, you can store your model in EFS and read it from Lambda, enabling efficient handling of large files for serverless inference.
Thank you for following along, and happy coding!