Monitoring storage performance and capacity is crucial for maintaining application responsiveness and preventing outages.

Let’s look at a typical storage setup and how we’d monitor it. Imagine a PostgreSQL database running on an EC2 instance with an EBS volume attached.

The Setup

  • EC2 Instance: i-0abcdef1234567890
  • EBS Volume: vol-0123456789abcdef0
  • Database: PostgreSQL 14
  • Operating System: Ubuntu 22.04 LTS

Monitoring IOPS (Input/Output Operations Per Second)

IOPS measures how many read or write operations your storage can handle per second. High IOPS is critical for transactional databases like PostgreSQL.

1. Checking EBS Volume IOPS Limits

EBS volumes have provisioned IOPS limits. gp3 volumes, for example, offer 3,000 base IOPS and can be provisioned up to 16,000 IOPS. io2 volumes offer higher performance and durability.

To check your EBS volume’s current performance and limits, you can use the AWS CLI:

aws ec2 describe-volumes --volume-ids vol-0123456789abcdef0 --query 'Volumes[*].[VolumeId, VolumeType, Iops, Size, Throughput]' --output table

This command will show you the VolumeType (e.g., gp3), Iops (provisioned IOPS), Size (in GiB), and Throughput (in MiB/s).

2. Monitoring Actual IOPS on the Instance

On the EC2 instance itself, you can use iostat to see the actual I/O activity.

iostat -xd 5

This command will output I/O statistics every 5 seconds. Look for the %util column for the device corresponding to your EBS volume (often nvme0n1p1 or similar for modern EC2 instances). If %util is consistently at or near 100%, you’re hitting your EBS volume’s IOPS limit.

The r/s and w/s columns show read and write operations per second.

3. Identifying IOPS Bottlenecks

If iostat shows high %util and r/s + w/s are high, your EBS volume is the bottleneck.

The Fix:

  • For gp3 volumes: Increase provisioned IOPS.

    aws ec2 modify-volume --volume-id vol-0123456789abcdef0 --iops 8000
    

    This command modifies the EBS volume to provision 8,000 IOPS. gp3 volumes allow you to scale IOPS independently of size, and the change typically takes effect within minutes.

  • For other volume types (e.g., gp2): IOPS are tied to size. You’d need to increase the volume size.

    aws ec2 modify-volume --volume-id vol-0123456789abcdef0 --size 500
    

    This increases the gp2 volume to 500 GiB, which in turn increases its provisioned IOPS.

Monitoring Latency

Latency is the time it takes for an I/O operation to complete. High latency directly impacts application performance, making it feel sluggish.

1. Checking EBS Volume Latency

AWS CloudWatch provides detailed metrics for EBS volumes.

  • Metric Name: VolumeReadLatency and VolumeWriteLatency
  • Namespace: AWS/EBS
  • Dimensions: VolumeId

You can view these metrics in the CloudWatch console or via the AWS CLI. For example, to get the average read latency over the last hour:

aws cloudwatch get-metric-statistics \
    --namespace AWS/EBS \
    --metric-name VolumeReadLatency \
    --dimensions Name=VolumeId,Value=vol-0123456789abcdef0 \
    --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
    --period 3600 \
    --statistics Average

2. Monitoring Latency on the Instance

iostat also provides latency metrics.

iostat -xd 5

Look for the await column. This is the average I/O wait time, including queue time and service time, in milliseconds. High await values indicate latency issues.

3. Identifying Latency Bottlenecks

If CloudWatch metrics or iostat show consistently high await times (e.g., > 20ms for typical applications, or > 100ms for sensitive ones), your storage is likely the bottleneck. This can be due to an overloaded volume (hitting IOPS/throughput limits) or underlying hardware issues.

The Fix:

  • Increase IOPS/Throughput: As described above, if the EBS volume is saturated, increasing its provisioned IOPS or throughput (for gp3 and io2 volumes) can resolve latency issues.

    # Example for increasing throughput on a gp3 volume
    aws ec2 modify-volume --volume-id vol-0123456789abcdef0 --throughput 250
    

    This sets the throughput to 250 MiB/s.

  • Database Tuning: Sometimes, high latency is not directly an EBS issue but a symptom of inefficient database queries or configurations that generate excessive I/O. Analyze PostgreSQL logs for slow queries.

  • Instance Type: If you’re on a very small instance type, the network bandwidth to the EBS volume might be a bottleneck. Consider a larger instance type.

Monitoring Capacity

Running out of storage space will inevitably lead to application failures and data loss.

1. Checking EBS Volume Capacity

You can check the EBS volume size and usage on the instance.

df -h /path/to/your/mountpoint

For example, if your PostgreSQL data directory is mounted on /var/lib/postgresql/data:

df -h /var/lib/postgresql/data

This will show you the total size, used space, available space, and the percentage used.

2. Monitoring EBS Volume Size in AWS

You can also check the volume size directly in AWS:

aws ec2 describe-volumes --volume-ids vol-0123456789abcdef0 --query 'Volumes[*].Size' --output text

3. Identifying Capacity Bottlenecks

If df -h shows that the filesystem is over 90-95% full, you are approaching a capacity bottleneck.

The Fix:

  • Modify EBS Volume Size: Increase the size of the EBS volume.

    aws ec2 modify-volume --volume-id vol-0123456789abcdef0 --size 500
    

    This increases the volume to 500 GiB. After the modification, you’ll need to extend the filesystem to utilize the new space.

  • Extend the Filesystem: For ext4 or XFS filesystems, this is usually straightforward.

    # Example for extending an ext4 filesystem on /dev/nvme0n1p1
    sudo resize2fs /dev/nvme0n1p1
    

    This command tells the filesystem to grow to fill the available space on the underlying block device.

  • Clean Up Data: If the growth is unexpected, investigate what is consuming the space. PostgreSQL can grow rapidly if not managed. Check large tables, WAL (Write-Ahead Log) files, and temporary files.

The Next Step

Once you’ve addressed IOPS, latency, and capacity, the next common monitoring challenge will be throughput limitations, especially for read-heavy workloads or large sequential writes, which are governed by the Throughput parameter on EBS gp3 and io2 volumes.

Want structured learning?

Take the full Storage course →