SQS ECS Autoscaling is the mechanism that lets you adjust the number of tasks running on Amazon Elastic Container Service (ECS) based on the number of messages waiting in an Amazon Simple Queue Service (SQS) queue.
Here’s an SQS queue that’s seeing a surge in messages:
{
"ApproximateNumberOfMessages": 1500,
"ApproximateNumberOfMessagesNotVisible": 200,
"ApproximateNumberOfMessagesDelayed": 0,
"ApproximateNumberOfMessagesVisibile": 1500,
"CreatedTimestamp": 1678886400,
"LastModifiedTimestamp": 1678887600,
"QueueArn": "arn:aws:sqs:us-east-1:123456789012:my-processing-queue",
"QueueName": "my-processing-queue"
}
When a message arrives, it’s added to ApproximateNumberOfMessages. If your ECS tasks are slow to process, ApproximateNumberOfMessagesNotVisible (messages that have been dequeued but not yet deleted) will also grow. The visible count is what matters most for scaling: ApproximateNumberOfMessagesVisibile.
The core idea is to tell ECS, "Hey, if ApproximateNumberOfMessagesVisibile for my-processing-queue goes above 100, spin up more tasks. If it drops below 50, shut some down." This ensures you have enough processing power to keep up with incoming work without overspending on idle containers.
The magic behind this is AWS Application Auto Scaling. You configure a target tracking scaling policy on your ECS service. This policy uses a metric – in this case, ApproximateNumberOfMessagesVisible from SQS – to determine when to adjust the desired count of your ECS tasks.
Let’s set up a service that processes messages from my-processing-queue. We’ll assume a simple ECS service configuration where tasks pull messages from SQS, process them, and then delete them.
First, ensure your ECS task definition has the necessary IAM permissions to sqs:ReceiveMessage, sqs:DeleteMessage, and sqs:GetQueueAttributes for the target SQS queue.
Next, create an ECS service. When defining the service, you’ll configure auto scaling. This is done via the Application Auto Scaling service. You register your ECS service with Application Auto Scaling, then create a target tracking scaling policy.
Here’s a sample CloudFormation snippet for setting up the ECS service and its auto scaling:
Resources:
MyProcessingService:
Type: AWS::ECS::Service
Properties:
Cluster: !Ref MyECSCluster
ServiceName: my-processing-service
TaskDefinition: !Ref MyTaskDefinition
DesiredCount: 1 # Start with a single task
# ... other service configurations like VPCConfig, LoadBalancers, etc.
MyScalableTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
MaxCapacity: 10 # Maximum number of tasks to scale up to
MinCapacity: 1 # Minimum number of tasks to keep running
ResourceId: !Sub "service/${MyECSCluster}/${MyProcessingService}"
ScalableDimension: ecs:service:DesiredCount
ServiceNamespace: ecs
MyQueueDepthScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: MyQueueDepthScalingPolicy
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref MyScalableTarget
TargetTrackingScalingPolicyConfiguration:
TargetValue: 100.0 # Aim for 100 visible messages per task
PredefinedMetricSpecification:
PredefinedMetricType: SQS_QUEUE_DEPTH_VISIBLE
ResourceLabel: !Sub "queue/${MyProcessingQueue.QueueName}/MyProcessingQueue" # This is crucial!
ScaleInCooldown: 300 # Wait 5 minutes before scaling in
ScaleOutCooldown: 300 # Wait 5 minutes before scaling out
The ResourceLabel in the PredefinedMetricSpecification is critical. It tells Application Auto Scaling which SQS queue to monitor. It follows the format queue/QUEUE_NAME/QUEUE_ACCOUNT_ID or queue/QUEUE_NAME/REGION/QUEUE_ACCOUNT_ID. In the CloudFormation example above, I’ve used a simplified queue/QUEUE_NAME/MyProcessingQueue which might not work directly. The correct format is queue/QUEUE_NAME/REGION/QUEUE_ACCOUNT_ID or if the queue is in the same account/region as the ECS service, it might infer it, but being explicit is better. A more robust ResourceLabel would look like queue/my-processing-queue/us-east-1/123456789012.
The TargetValue of 100.0 means the auto scaling group will try to maintain an average of 100 visible messages per task. If the average goes above 100, it scales out. If it drops significantly below 100 (and the cooldown periods have passed), it scales in.
The ScaleInCooldown and ScaleOutCooldown values (in seconds) prevent rapid fluctuations. After scaling out, it won’t scale in for 300 seconds. After scaling in, it won’t scale out again for 300 seconds. This is important to avoid thrashing.
The truly counterintuitive part of SQS autoscaling is that the TargetValue is not a hard threshold but a per-task average. If you have 5 tasks and the TargetValue is 100, the system scales out when ApproximateNumberOfMessagesVisible exceeds 500 (5 tasks * 100 messages/task). It scales in when the total visible messages drop enough that the average per task falls below a certain threshold, respecting the cooldowns. This means the total number of messages that triggers scaling in or out is dynamic, based on your current task count.
The next thing you’ll likely encounter is ensuring your tasks can actually process messages fast enough to keep the queue depth down. If tasks are too slow, you might see ApproximateNumberOfMessagesNotVisible grow as tasks pull messages but can’t delete them before the visibility timeout expires, leading to redelivery.