The most surprising thing about tiered storage is that it’s not primarily about saving money, but about performance.

Imagine you have a big, ol’ database. Some of the data in there is accessed constantly – think user login info, current order statuses. This needs to be lightning-fast, so it lives on the "hot" tier, usually high-performance SSDs. Other data, like historical sales reports from last quarter, is accessed less often but still needs to be retrieved reasonably quickly. This goes on the "warm" tier, maybe a mix of SSDs and faster HDDs. Then there’s data that might be needed for compliance or infrequent analysis – think logs from five years ago. This can sit on the "cold" tier, which is cheaper, slower storage, often spinning disks. Finally, "archive" is for data you legally must keep but hope to never, ever touch again, typically on tape or object storage designed for extreme durability and low cost over long periods.

Let’s see it in action. We’ll use Amazon S3 as our example, as it clearly defines these tiers.

Here’s how you might configure a lifecycle policy to move data:

{
  "Rules": [
    {
      "ID": "Move to Infrequent Access after 30 days",
      "Prefix": "logs/",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        }
      ]
    },
    {
      "ID": "Move to Glacier after 90 days",
      "Prefix": "logs/",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        }
      ]
    },
    {
      "ID": "Expire after 365 days",
      "Prefix": "logs/",
      "Status": "Enabled",
      "Expiration": {
        "Days": 365
      }
    }
  ]
}

In this policy:

  • "ID": A human-readable name for the rule.
  • "Prefix": "logs/": This rule applies only to objects within the logs/ folder (or "key prefix" in S3 terms).
  • "Status": "Enabled": The rule is active.
  • "Transitions": These define moving data between storage classes.
    • "Days": 30, "StorageClass": "STANDARD_IA": After 30 days, objects in logs/ move from STANDARD (hot) to STANDARD_IA (Infrequent Access – warm).
    • "Days": 90, "StorageClass": "GLACIER": After 90 days, objects move from STANDARD_IA to GLACIER (cold/archive).
  • "Expiration": This defines deleting data.
    • "Days": 365: After 365 days, objects in logs/ are permanently deleted.

This policy automates the movement. S3 checks your objects daily and applies the rules. You don’t have to manually move a single file.

The core problem tiered storage solves is the mismatch between data access frequency and storage cost/performance. If you put all your data on the fastest, most expensive storage (like NVMe SSDs), you’re wasting money on data that’s rarely accessed. Conversely, if you put all your data on the cheapest, slowest storage (like tape), your frequently accessed data will be so slow to retrieve that your applications become unusable. Tiering strikes a balance: keep frequently accessed data on fast, expensive tiers and less frequently accessed data on slower, cheaper tiers. This optimizes both performance and cost.

Internally, the "tiers" are often just different types of physical hardware or different service configurations. For instance, hot tier might be NVMe SSDs in a SAN, warm tier could be SAS HDDs, cold tier might be SATA HDDs or cloud object storage with high latency, and archive could be magnetic tape libraries or cloud archive services. The system then uses metadata and policies to direct data to the correct tier and manage its movement. Access patterns are usually monitored by the application or a dedicated storage management layer.

The exact levers you control are primarily policies. These policies define:

  1. When data moves (e.g., after X days, based on access count).
  2. Where data moves to (e.g., from hot to warm, warm to cold).
  3. When data is deleted (expiration).
  4. Which data is affected (e.g., by prefix, tags, or object age).

Understanding the retrieval costs is crucial. Moving data to cheaper tiers is usually free or very cheap. However, retrieving data from colder tiers often incurs a cost per GB and can have significantly longer retrieval times. For example, retrieving data from Amazon Glacier can take minutes to hours, and there’s a cost associated with the retrieval request itself. This is why archive is for data you really don’t expect to need often.

You might think that setting a lifecycle policy to move data to a cheaper tier automatically means you’ve saved money. However, if you then immediately need to access that data, you’ll pay the retrieval fee for the colder tier, and the overall cost might be higher than if it had stayed on the warmer, slightly more expensive tier. The true savings come from a well-tuned policy that aligns with actual access patterns.

The next step is often managing data redundancy and disaster recovery across these tiers.

Want structured learning?

Take the full Storage course →