SQS cross-region replication isn’t about making your queues redundant; it’s about making your message processing resilient across geographic failures.
Imagine you have a critical service that consumes messages from an SQS queue in us-east-1. If us-east-1 goes dark, your service stops. Cross-region replication, when configured, automatically copies messages from your source queue (e.g., my-source-queue in us-east-1) to a destination queue (e.g., my-source-queue-dr in us-west-2). Your disaster recovery (DR) application would then be pointed at my-source-queue-dr. When the primary region fails, you switch your application’s endpoint to the DR queue, and it starts consuming the replicated messages, minimizing downtime.
Here’s a simplified look at how it works in practice. Let’s say we have a source queue my-prod-queue in us-east-1 and we want to replicate it to my-dr-queue in us-west-2.
First, create the destination queue. It needs to be in the target region.
aws sqs create-queue --queue-name my-dr-queue --region us-west-2
Then, you configure the replication on the source queue. This is a crucial point: the replication settings live on the source.
aws sqs update-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-prod-queue \
--attributes \
"ReplicationConfiguration.Destinations.1.QueueARN=arn:aws:sqs:us-west-2:123456789012:my-dr-queue,ReplicationConfiguration.Destinations.1.Region=us-west-2" \
--region us-east-1
Notice the QueueARN and Region for the destination. AWS handles the rest. When a message arrives in my-prod-queue, SQS automatically attempts to send a copy to my-dr-queue.
The magic happens in the ReplicationConfiguration attribute. This tells SQS: "Hey, for every message that lands here, also send a copy over to that ARN in that region." It’s a one-way street from the source to the destination.
Now, let’s see what happens when messages are sent.
On the producer side (in us-east-1):
import boto3
sqs_client_source = boto3.client('sqs', region_name='us-east-1')
response = sqs_client_source.send_message(
QueueUrl='https://sqs.us-east-1.amazonaws.com/123456789012/my-prod-queue',
MessageBody='{"order_id": "ORD123", "item": "widget"}',
MessageAttributes={
'EventType': {
'StringValue': 'NEW_ORDER',
'DataType': 'String'
}
}
)
print(f"Sent message to source queue: {response['MessageId']}")
After this message is sent, SQS will asynchronously replicate it. It’s not instantaneous, but it’s typically within seconds for successful transmissions.
On the consumer side, in a DR scenario, you’d point your application to the destination queue in us-west-2.
import boto3
# In a DR scenario, this client would be configured for us-west-2
sqs_client_dr = boto3.client('sqs', region_name='us-west-2')
# Important: You need the QueueURL for the DR queue
dr_queue_url = 'https://sqs.us-west-2.amazonaws.com/123456789012/my-dr-queue'
response = sqs_client_dr.receive_message(
QueueUrl=dr_queue_url,
MaxNumberOfMessages=1,
WaitTimeSeconds=20, # Use long polling
VisibilityTimeout=30 # Standard visibility timeout
)
if 'Messages' in response:
message = response['Messages'][0]
print(f"Received message from DR queue: {message['MessageId']}")
print(f"Body: {message['Body']}")
print(f"Attributes: {message['MessageAttributes']}")
# Delete the message after processing
sqs_client_dr.delete_message(
QueueUrl=dr_queue_url,
ReceiptHandle=message['ReceiptHandle']
)
print("Deleted message from DR queue.")
else:
print("No messages received from DR queue.")
The key to managing this during a failover is having your application ready to switch its endpoint. You’d typically have a DNS record or a configuration parameter that points to the active queue. During a DR event, you update that parameter to point to the us-west-2 queue.
You can verify replication status by checking the QueueAttributes of the source queue.
aws sqs get-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-prod-queue \
--attribute-names All \
--region us-east-1
Look for the ReplicationConfiguration section in the output. It will show the destination ARN and region.
One thing that surprises many is that SQS replication doesn’t guarantee exactly-once delivery to the DR queue. It’s designed for at-least-once delivery. This means if a replication attempt fails and is retried, you could potentially see a duplicate message in the DR queue. Your consumer in the DR region must be idempotent to handle this. The MessageId is replicated, but the ReceiptHandle is not, which is why consumers must rely on message content or other custom IDs for idempotency.
The next hurdle is how to effectively failback once the primary region is restored.