SQS Cost Optimization: Reduce API Calls and Costs (2026)

SQS is surprisingly cheap, but the illusion of "free" can lead to massive, unexpected bills if you’re not careful about how you’re interacting with it.

Let’s see SQS in action. Imagine a simple producer-consumer pattern. A web server (producer) receives user requests and puts messages onto an SQS queue. A fleet of worker instances (consumers) poll that queue, pull messages, process them, and then delete them.

Here’s what that looks like from the producer’s side, using the AWS SDK for Python (Boto3):

import boto3

sqs = boto3.client('sqs', region_name='us-east-1')
queue_url = 'https://sqs.us-east-1.amazonaws.com/123456789012/my-processing-queue'

def send_message(message_body):
    response = sqs.send_message(
        QueueUrl=queue_url,
        MessageBody=message_body
    )
    print(f"Sent message ID: {response['MessageId']}")

# Example usage:
send_message("process_user_data:user_id=123")

And here’s the consumer side:

import boto3
import time

sqs = boto3.client('sqs', region_name='us-east-1')
queue_url = 'https://sqs.us-east-1.amazonaws.com/123456789012/my-processing-queue'

def process_messages():
    response = sqs.receive_message(
        QueueUrl=queue_url,
        MaxNumberOfMessages=10,  # Batching!
        WaitTimeSeconds=20,      # Long polling!
        VisibilityTimeout=300    # Standard visibility timeout
    )

    if 'Messages' in response:
        for message in response['Messages']:
            print(f"Received message ID: {message['MessageId']}")
            message_body = message['Body']
            # --- Process the message here ---
            print(f"Processing: {message_body}")
            # --- End processing ---

            # Delete the message after successful processing
            sqs.delete_message(
                QueueUrl=queue_url,
                ReceiptHandle=message['ReceiptHandle']
            )
            print(f"Deleted message ID: {message['MessageId']}")
    else:
        print("No messages received.")

# Example usage:
while True:
    process_messages()
    # Short sleep to avoid tight loop if WaitTimeSeconds is 0, though not strictly necessary with WaitTimeSeconds > 0
    time.sleep(1)

This is the core loop: send_message and receive_message/delete_message. The cost isn’t in the data transferred, but in the API calls made to SQS. Every SendMessage, ReceiveMessage, and DeleteMessage request incurs a small charge. When you have thousands or millions of these per day, it adds up.

The key to optimization lies in minimizing these API calls. The two most impactful levers are batching and long polling.

Batching means sending or receiving multiple messages in a single API call.

SendMessage can send up to 10 messages in one request.
ReceiveMessage can retrieve up to 10 messages in one request (though you can configure this with MaxNumberOfMessages).
DeleteMessage can delete up to 10 messages in one request if they have the same ReceiptHandle (which is rare, usually you delete individually) or DeleteMessageBatch can delete up to 10 different messages by providing their ReceiptHandles.

Long polling means that when you ReceiveMessage, SQS holds the request open for up to 20 seconds, waiting for messages to arrive. If messages arrive within that time, they are returned immediately. If not, the request times out and returns an empty response. This is crucial because it prevents "empty polls" – frequent ReceiveMessage calls that return nothing, which are pure API call cost with no work done.

Let’s break down the common cost drivers and how to combat them:

Excessive ReceiveMessage calls (Short Polling): If your consumer code calls ReceiveMessage every second without waiting, and the queue is often empty, you’re burning API calls.
- Diagnosis: Check your consumer code. Look for ReceiveMessage calls in a tight loop. Examine your SQS metrics in CloudWatch for NumberOfEmptyReceives. High numbers here indicate short polling or insufficient WaitTimeSeconds.
- Fix: Implement long polling by setting WaitTimeSeconds to a value greater than 0 (e.g., WaitTimeSeconds=20). This tells SQS to hold the connection open for up to 20 seconds, significantly reducing the number of API calls when the queue is quiet. Your consumer code will then only make a ReceiveMessage call roughly every 20 seconds if no messages are available.
- Why it works: Instead of polling every second (120 calls/minute), you poll every 20 seconds (3 calls/minute). This dramatically cuts down ReceiveMessage API calls.
Not batching SendMessage: If your producer sends messages one by one, even if it’s processing them quickly.
- Diagnosis: Review your producer code. If send_message is called in a loop for each individual item, you’re not batching.
- Fix: Implement SendMessageBatch. Collect messages in memory (up to 10) and then send them in a single SendMessageBatch API call.
- Why it works: Sending 10 messages individually costs 10 SendMessage API calls. Sending them via SendMessageBatch costs only 1 SendMessageBatch API call, a 10x reduction.
Not batching DeleteMessage: If your consumer fetches 10 messages using MaxNumberOfMessages=10 but then deletes them one by one.
- Diagnosis: Look at your consumer’s delete logic. If it iterates through response['Messages'] and calls sqs.delete_message for each one individually.
- Fix: Use DeleteMessageBatch. After processing a batch of messages, construct a list of Id and ReceiptHandle pairs for all successfully processed messages and send them in a single DeleteMessageBatch call.
- Why it works: Deleting 10 messages individually costs 10 DeleteMessage API calls. Using DeleteMessageBatch costs 1 DeleteMessageBatch API call, again a 10x reduction for that batch.
Overly frequent polling with WaitTimeSeconds=0: Even with long polling configured, if your application framework or logic forces a new ReceiveMessage call immediately after a previous one returns (even if empty), you’ll still have high call volume.
- Diagnosis: Your consumer logic might look like while True: receive_message(); process_messages();. This can lead to rapid polling if WaitTimeSeconds is set to 0 or if the processing time is very short.
- Fix: Ensure WaitTimeSeconds is set to a value > 0 (e.g., 20). Also, ensure your processing loop doesn’t immediately re-invoke receive_message without allowing the WaitTimeSeconds to expire if no messages were found. A simple time.sleep(1) after processing (or after an empty receive) can help if your WaitTimeSeconds is 0, but using WaitTimeSeconds is preferred.
- Why it works: WaitTimeSeconds is the primary mechanism for throttling ReceiveMessage calls when no messages are present.
Using Standard Queues for tasks that don’t require strict ordering or exactly-once processing: While not directly an API call cost, it can lead to higher throughput requirements and thus more API calls overall.
- Diagnosis: You’re using standard queues but your application logic can tolerate out-of-order or duplicate messages with minimal impact.
- Fix: Consider using FIFO queues only if ordering or exactly-once processing is a strict requirement. For most background task processing, standard queues are sufficient and often cheaper due to higher throughput limits and simpler internal mechanics. If you must use FIFO, ensure your producer and consumer are optimized for batching.
- Why it works: Standard queues offer higher throughput and a simpler, less costly internal model. FIFO queues have higher latency and cost per API operation due to the mechanisms required to maintain order and exactly-once delivery.
Too many queues: Each queue has a small overhead, but more importantly, managing and interacting with many queues can lead to more complex application logic, potentially increasing API calls as you need to query queue attributes or send to dynamic destinations.
- Diagnosis: You have hundreds or thousands of SQS queues, each serving a very specific, small purpose.
- Fix: Consolidate queues where possible. Use message attributes or a field within the message body to route messages to the correct processing logic within a single, larger queue.
- Why it works: Reduces the number of queue-specific API calls (e.g., GetQueueUrl) and simplifies overall management.

The most common and impactful optimization is correctly implementing long polling (WaitTimeSeconds > 0) and batching (SendMessageBatch, ReceiveMessage with MaxNumberOfMessages > 1, DeleteMessageBatch). When these are correctly implemented, your API call count for a given workload can drop by orders of magnitude.

After fixing these, you’ll likely encounter the next most common "cost" concern: data transfer costs if your SQS queues are in regions different from your consumers, or if you’re using SQS within a VPC without VPC endpoints.

More Deep Dives in Sqs