Amazon SQS’s "redrive" functionality is actually a two-part system, and you’re likely only thinking about the first part.

Imagine you have a queue, my-processing-queue. Messages are piling up in its dead-letter queue (DLQ), my-processing-dlq. You want to get them back into my-processing-queue to try processing them again.

Here’s how you’d do it programmatically using the AWS SDK for Python (Boto3):

import boto3

sqs = boto3.client('sqs', region_name='us-east-1')

# Replace with your actual queue URLs
source_queue_url = 'https://sqs.us-east-1.amazonaws.com/123456789012/my-processing-queue'
dlq_url = 'https://sqs.us-east-1.amazonaws.com/123456789012/my-processing-dlq'

# First, configure the source queue to use the DLQ
# This is a one-time setup for a given queue-DLQ pair.
# You only need to do this once per queue.
try:
    sqs.start_message_move_task(
        SourceArn='arn:aws:sqs:us-east-1:123456789012:my-processing-queue',
        DestinationArn='arn:aws:sqs:us-east-1:123456789012:my-processing-dlq'
    )
    print("Message move task configuration initiated.")
except sqs.exceptions.AlreadyExistsException:
    print("Message move task already configured for this queue-DLQ pair.")
except Exception as e:
    print(f"Error configuring message move task: {e}")


# Now, to actually move messages from the DLQ back to the source queue:
# This is the part you run when you want to redrive messages.
try:
    response = sqs.start_message_move_task(
        SourceArn='arn:aws:sqs:us-east-1:123456789012:my-processing-dlq', # The DLQ is the source here
        DestinationArn='arn:aws:sqs:us-east-1:123456789012:my-processing-queue' # The original queue is the destination
    )
    task_handle = response['TaskHandle']
    print(f"Message move task started with handle: {task_handle}")

    # You can then poll for the task status using the task_handle
    # For simplicity, this example doesn't include polling logic.
    # You'd typically poll sqs.describe_message_move_task(TaskHandle=task_handle)
    # until status is 'complete' or 'failed'.

except Exception as e:
    print(f"Error starting message move task: {e}")

The start_message_move_task API call is the key here. When you initiate a message move task, you specify a SourceArn and a DestinationArn. SQS then starts a background process that pulls messages from the source and sends them to the destination.

The crucial insight is that the "redrive" operation you’re looking for isn’t a direct "redrive from DLQ to original queue" API call. Instead, you use the same start_message_move_task API, but you reverse the roles: the DLQ becomes the SourceArn, and your original processing queue becomes the DestinationArn.

This means the first step is actually configuring the original queue to point to its DLQ. This is done via the start_message_move_task API as well, but with the original queue as the source and the DLQ as the destination. SQS uses this configuration to set up the infrastructure for moving messages when they exceed the maxReceiveCount. You only do this configuration step once per queue-DLQ pair.

Once that’s set up, you can initiate the actual redrive by calling start_message_move_task again, but this time, you specify the DLQ’s ARN as the SourceArn and the original queue’s ARN as the DestinationArn. This tells SQS to pull messages from the DLQ and send them to the original queue.

The TaskHandle returned by start_message_move_task is an identifier for this specific move operation. You can use describe_message_move_task with this handle to monitor the progress and status of the move. The task will eventually complete when all messages have been moved, or it might fail if there are issues.

The most common reason a message ends up in a DLQ is that your consumers failed to process it within the visibility timeout, and the maxReceiveCount was reached. The start_message_move_task API allows you to programmatically bypass the need to manually move messages via the console, which is essential for automated recovery workflows.

When you successfully redrive messages, the next immediate problem you’ll encounter is that your consumers might still be broken, and the messages could end up back in the DLQ.

Want structured learning?

Take the full Sqs course →