The SQS Request-Response pattern is fundamentally a lie; it’s not about getting a direct reply, but about cleverly managing the lack of one.
Imagine a world where you can’t directly ask a service a question and get an answer back immediately. That’s the reality for many asynchronous systems. SQS, Amazon’s Simple Queue Service, is a prime example. You can send a message (a "request") into an SQS queue, and some worker process will eventually pick it up, do the work, and produce a result. But how does the original sender know which result belongs to their request, especially if multiple requests are flying around? This is where correlation comes in.
Let’s see this in action. We’ll set up a scenario with two SQS queues: request-queue and response-queue.
// Message sent to request-queue
{
"message_id": "req-12345",
"payload": {
"operation": "calculate_sum",
"numbers": [10, 20, 30]
}
}
A worker picks this up, performs the calculation, and then sends a response to the response-queue.
// Message sent to response-queue
{
"original_message_id": "req-12345", // <-- The correlation key!
"status": "success",
"result": 60
}
The original sender, which has been polling response-queue, finds this message. By matching original_message_id with the message_id it sent earlier, it knows this result is for its request.
The core problem this pattern solves is state management in distributed systems. When you send a message, you don’t block waiting for a response. The worker processing the message might take seconds, minutes, or even hours. During that time, the original sender could be handling thousands of other requests. Without a way to link a specific response back to its originating request, the system devolves into chaos. You’d have no idea which calculation result, which file processing status, or which data transformation output belongs to which original command.
Internally, SQS is a robust, highly available message broker. It doesn’t natively support request-response. You send a message, it sits in the queue. You receive a message, it’s removed (or becomes invisible for a period). The "response" part is entirely a convention you build on top. This means you need to implement the correlation logic yourself. The most common mechanism is to include a unique identifier from the original request within the response message. This identifier could be:
- A
correlation_id: A GUID or UUID generated by the client before sending the request. - The original SQS
MessageId: Each message SQS receives has a uniqueMessageId. You can include this in your request payload and then echo it back in the response. - A custom request identifier: If your application has its own internal request tracking system, you can use that ID.
The sender must generate this unique ID, include it in the request message, and then poll a separate response queue. When it receives a message from the response queue, it extracts the correlation ID and matches it against the IDs of the requests it has outstanding.
The sender’s polling logic is crucial. It needs to:
- Generate a unique ID for each outgoing request.
- Send the request message to the request queue, including the unique ID.
- Store the mapping of
unique_idto request details (or just the fact that a request is pending). - Poll the designated response queue.
- When a message arrives in the response queue, extract the correlation ID.
- Look up the pending request associated with that correlation ID.
- Process the response and remove the request from its pending list.
The worker needs to:
- Receive a message from the request queue.
- Extract the correlation ID from the message payload.
- Perform its task.
- Construct a response message, including the extracted correlation ID.
- Send the response message to the designated response queue.
This pattern is surprisingly resilient. If a worker crashes after processing a request but before sending the response, the request might be lost (depending on visibility timeout and retry logic), but the sender simply won’t receive a correlated response. If the sender crashes, the worker might process the request, but the response will sit in the response queue, potentially unread, until it’s manually investigated or timed out. The key is that the correlation ID allows for eventual reconciliation, even if the original sender isn’t there to receive it immediately.
The most surprising true thing about this pattern is that it often involves two SQS queues for a single logical operation: one for requests and one for responses, and the sender polls the response queue without knowing which message is for them until the correlation ID is checked.
The next step is handling failures and ensuring idempotency when requests are processed multiple times.