Monitoring storage performance and capacity is crucial for maintaining application responsiveness and preventing outages.
Let’s look at a typical storage setup and how we’d monitor it. Imagine a PostgreSQL database running on an EC2 instance with an EBS volume attached.
The Setup
- EC2 Instance:
i-0abcdef1234567890 - EBS Volume:
vol-0123456789abcdef0 - Database: PostgreSQL 14
- Operating System: Ubuntu 22.04 LTS
Monitoring IOPS (Input/Output Operations Per Second)
IOPS measures how many read or write operations your storage can handle per second. High IOPS is critical for transactional databases like PostgreSQL.
1. Checking EBS Volume IOPS Limits
EBS volumes have provisioned IOPS limits. gp3 volumes, for example, offer 3,000 base IOPS and can be provisioned up to 16,000 IOPS. io2 volumes offer higher performance and durability.
To check your EBS volume’s current performance and limits, you can use the AWS CLI:
aws ec2 describe-volumes --volume-ids vol-0123456789abcdef0 --query 'Volumes[*].[VolumeId, VolumeType, Iops, Size, Throughput]' --output table
This command will show you the VolumeType (e.g., gp3), Iops (provisioned IOPS), Size (in GiB), and Throughput (in MiB/s).
2. Monitoring Actual IOPS on the Instance
On the EC2 instance itself, you can use iostat to see the actual I/O activity.
iostat -xd 5
This command will output I/O statistics every 5 seconds. Look for the %util column for the device corresponding to your EBS volume (often nvme0n1p1 or similar for modern EC2 instances). If %util is consistently at or near 100%, you’re hitting your EBS volume’s IOPS limit.
The r/s and w/s columns show read and write operations per second.
3. Identifying IOPS Bottlenecks
If iostat shows high %util and r/s + w/s are high, your EBS volume is the bottleneck.
The Fix:
-
For
gp3volumes: Increase provisioned IOPS.aws ec2 modify-volume --volume-id vol-0123456789abcdef0 --iops 8000This command modifies the EBS volume to provision 8,000 IOPS.
gp3volumes allow you to scale IOPS independently of size, and the change typically takes effect within minutes. -
For other volume types (e.g.,
gp2): IOPS are tied to size. You’d need to increase the volume size.aws ec2 modify-volume --volume-id vol-0123456789abcdef0 --size 500This increases the
gp2volume to 500 GiB, which in turn increases its provisioned IOPS.
Monitoring Latency
Latency is the time it takes for an I/O operation to complete. High latency directly impacts application performance, making it feel sluggish.
1. Checking EBS Volume Latency
AWS CloudWatch provides detailed metrics for EBS volumes.
- Metric Name:
VolumeReadLatencyandVolumeWriteLatency - Namespace:
AWS/EBS - Dimensions:
VolumeId
You can view these metrics in the CloudWatch console or via the AWS CLI. For example, to get the average read latency over the last hour:
aws cloudwatch get-metric-statistics \
--namespace AWS/EBS \
--metric-name VolumeReadLatency \
--dimensions Name=VolumeId,Value=vol-0123456789abcdef0 \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 3600 \
--statistics Average
2. Monitoring Latency on the Instance
iostat also provides latency metrics.
iostat -xd 5
Look for the await column. This is the average I/O wait time, including queue time and service time, in milliseconds. High await values indicate latency issues.
3. Identifying Latency Bottlenecks
If CloudWatch metrics or iostat show consistently high await times (e.g., > 20ms for typical applications, or > 100ms for sensitive ones), your storage is likely the bottleneck. This can be due to an overloaded volume (hitting IOPS/throughput limits) or underlying hardware issues.
The Fix:
-
Increase IOPS/Throughput: As described above, if the EBS volume is saturated, increasing its provisioned IOPS or throughput (for
gp3andio2volumes) can resolve latency issues.# Example for increasing throughput on a gp3 volume aws ec2 modify-volume --volume-id vol-0123456789abcdef0 --throughput 250This sets the throughput to 250 MiB/s.
-
Database Tuning: Sometimes, high latency is not directly an EBS issue but a symptom of inefficient database queries or configurations that generate excessive I/O. Analyze PostgreSQL logs for slow queries.
-
Instance Type: If you’re on a very small instance type, the network bandwidth to the EBS volume might be a bottleneck. Consider a larger instance type.
Monitoring Capacity
Running out of storage space will inevitably lead to application failures and data loss.
1. Checking EBS Volume Capacity
You can check the EBS volume size and usage on the instance.
df -h /path/to/your/mountpoint
For example, if your PostgreSQL data directory is mounted on /var/lib/postgresql/data:
df -h /var/lib/postgresql/data
This will show you the total size, used space, available space, and the percentage used.
2. Monitoring EBS Volume Size in AWS
You can also check the volume size directly in AWS:
aws ec2 describe-volumes --volume-ids vol-0123456789abcdef0 --query 'Volumes[*].Size' --output text
3. Identifying Capacity Bottlenecks
If df -h shows that the filesystem is over 90-95% full, you are approaching a capacity bottleneck.
The Fix:
-
Modify EBS Volume Size: Increase the size of the EBS volume.
aws ec2 modify-volume --volume-id vol-0123456789abcdef0 --size 500This increases the volume to 500 GiB. After the modification, you’ll need to extend the filesystem to utilize the new space.
-
Extend the Filesystem: For ext4 or XFS filesystems, this is usually straightforward.
# Example for extending an ext4 filesystem on /dev/nvme0n1p1 sudo resize2fs /dev/nvme0n1p1This command tells the filesystem to grow to fill the available space on the underlying block device.
-
Clean Up Data: If the growth is unexpected, investigate what is consuming the space. PostgreSQL can grow rapidly if not managed. Check large tables, WAL (Write-Ahead Log) files, and temporary files.
The Next Step
Once you’ve addressed IOPS, latency, and capacity, the next common monitoring challenge will be throughput limitations, especially for read-heavy workloads or large sequential writes, which are governed by the Throughput parameter on EBS gp3 and io2 volumes.