SQS alarms are more powerful than just alerting on queue depth; they can actively monitor the rate of messages flowing into your queue.

Let’s see this in action. Imagine you have a critical ingestion pipeline. If your upstream producer, perhaps a web server or an IoT device fleet, suddenly stops sending data, your SQS queue will remain empty, and your downstream processor will idle, potentially leading to missed events or stale data. An alarm on NumberOfMessagesSent can catch this at the source.

Here’s how you’d set it up in AWS CloudWatch:

  1. Navigate to CloudWatch Alarms: In the AWS Management Console, go to CloudWatch, then Alarms, and click "Create alarm."

  2. Select Metric:

    • Click "Select metric."
    • Under "All metrics," choose "SQS."
    • You’ll see a list of your queues. Select the specific queue you want to monitor.
    • Look for the NumberOfMessagesSent metric. This metric represents the number of messages sent to the queue by producers.
  3. Configure Alarm:

    • Metric: NumberOfMessagesSent
    • Statistic: Choose Sum. This is crucial because we want to know the total number of messages sent over a period, not the average or maximum, which could hide dips.
    • Period: Select a period that makes sense for your ingestion rate. For high-throughput systems, you might use 1 minute. For less frequent events, 5 minutes or even 15 minutes could be appropriate. Let’s use 5 minutes for this example.
    • Threshold type: Choose Static.
    • Whenever NumberOfMessagesSent is…
      • Condition: Less than
      • Threshold value: This is the critical part. You need to establish a baseline for your expected message rate. If your system normally sends at least 100 messages every 5 minutes, you’d set this to 100. If it’s a low-volume queue, maybe it’s 10. The goal is to set a value below your typical minimum. Let’s assume a healthy system sends at least 50 messages per 5-minute interval, so we set the threshold to 50.
  4. Configure Actions:

    • Notification: Choose an SNS topic to send alerts to. This topic can then trigger emails, SMS messages, Lambda functions, etc. You’ll likely want to configure an action for In alarm.
    • Auto Scaling: (Optional) You could potentially trigger scaling actions, though this is less common for NumberOfMessagesSent alarms than for queue depth.
  5. Add Name and Description: Give your alarm a clear name, like SQS-MyIngestionQueue-LowMessageRate and a description explaining what it monitors.

How it Works Internally:

CloudWatch periodically (based on your chosen Period) queries the NumberOfMessagesSent metric for your SQS queue. It aggregates the values for that metric over the Period (using the Sum statistic). If this sum falls below your defined Threshold value for a sustained period (typically two consecutive periods, but configurable), the alarm transitions to the ALARM state.

This alarm is effective because it doesn’t wait for messages to pile up and potentially cause downstream processing delays. Instead, it detects a cessation or significant reduction in incoming data. This allows you to investigate upstream issues, such as producer failures, network connectivity problems, or configuration errors, before they impact your consumers or lead to data loss.

The most surprising thing about NumberOfMessagesSent is that it’s a producer-side metric. You’re essentially alerting on the health of the source that’s feeding your queue, not just the queue itself. This provides a proactive way to detect problems that might otherwise go unnoticed until your consumers start complaining about a lack of work or stale data.

Consider the implications of Sum vs. Average for NumberOfMessagesSent. If you used Average with a threshold of 10 over a 5-minute period, a sudden burst of 100 messages followed by an hour of zero messages would not trigger the alarm because the average would still be above 10. Using Sum ensures that any period with insufficient message volume will be flagged, regardless of prior bursts.

Once you have NumberOfMessagesSent alarms working, the next logical step is to correlate this with your consumer’s NumberOfMessagesReceived metric to ensure a healthy end-to-end data flow.

Want structured learning?

Take the full Sqs course →