The most surprising thing about S3 is that it’s not just about storing files; it’s a distributed database that happens to store files, and its object storage model is fundamentally different from traditional file systems.
Let’s see it in action. Imagine you have an application that needs to serve images to users. You’d create an S3 bucket, say my-awesome-app-images, and then upload your images.
aws s3 cp my-local-image.jpg s3://my-awesome-app-images/user123/profile.jpg
Now, anyone with the right permissions can access my-awesome-app-images.s3.amazonaws.com/user123/profile.jpg.
Buckets: The Foundation
A bucket is the top-level container for your S3 data. Think of it like a root directory in a file system, but with some key differences.
- Global Uniqueness: Bucket names must be globally unique across all of AWS. This is why you often see names like
my-company-app-logs-2023-07-14. - Region Specific: While the name is global, the bucket itself resides in a specific AWS region. This is crucial for latency and cost. When you create a bucket, you choose its region.
- No Hierarchy: S3 doesn’t have true directories. When you create an object like
user123/profile.jpg, you’re creating an object nameduser123/profile.jpg. The/is just part of the object key. This flat structure is a core design choice.
Storage Classes: Cost vs. Access
S3 offers various storage classes, each optimized for different access patterns and cost profiles. Choosing the right class is a major lever for cost optimization.
- S3 Standard: For frequently accessed data. It offers high durability, availability, and performance. Think of this as your default, general-purpose storage.
- S3 Intelligent-Tiering: This is the magic class. It automatically moves data between frequent and infrequent access tiers based on usage patterns, optimizing costs without performance impact or operational overhead. You pay a small monthly monitoring and automation fee per object.
- S3 Standard-Infrequent Access (S3 Standard-IA): For data that is accessed less frequently but requires rapid access when needed. It has lower storage costs than S3 Standard but incurs retrieval fees. Good for backups or disaster recovery data you might need quickly.
- S3 One Zone-Infrequent Access (S3 One Zone-IA): Similar to Standard-IA but stores data in only one Availability Zone. This makes it cheaper but less resilient. If that AZ has an issue, your data could be lost. Best for data that can be easily recreated.
- S3 Glacier Instant Retrieval: For archive data that is rarely accessed but requires millisecond retrieval when requested. It’s cheaper than IA classes for storage but has higher retrieval costs.
- S3 Glacier Flexible Retrieval: For archive data where retrieval times of minutes to hours are acceptable. Significantly cheaper than Instant Retrieval.
- S3 Glacier Deep Archive: The lowest-cost storage class, designed for long-term archiving of data that is accessed once or twice a year. Retrieval times are typically 12-48 hours.
Example Configuration (using AWS CLI to set a lifecycle rule):
Let’s say you want to move logs that are older than 30 days from S3 Standard to S3 Standard-IA, and then to Glacier Flexible Retrieval after 90 days.
aws s3api put-bucket-lifecycle-configuration --bucket my-company-logs --lifecycle-configuration '{
"Rules": [
{
"ID": "MoveLogsToIA",
"Filter": {
"Prefix": "logs/"
},
"Status": "Enabled",
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
}
]
},
{
"ID": "MoveLogsToGlacier",
"Filter": {
"Prefix": "logs/"
},
"Status": "Enabled",
"Transitions": [
{
"Days": 90,
"StorageClass": "GLACIER"
}
],
"NoncurrentVersionTransitions": [
{
"NoncurrentDays": 30,
"StorageClass": "STANDARD_IA"
},
{
"NoncurrentDays": 90,
"StorageClass": "GLACIER"
}
]
}
]
}'
This configuration tells S3 to automatically transition objects in the logs/ prefix to STANDARD_IA after 30 days. After 90 days, they move to GLACIER. The NoncurrentVersionTransitions handle versioned objects.
Policies: Controlling Access
S3 uses a robust system of policies to control who can access what. There are two main types:
- Bucket Policies: These are JSON documents attached directly to a bucket. They grant or deny permissions to principals (users, services, other AWS accounts) for actions on that bucket and its objects.
- IAM Policies: These are attached to IAM users, groups, or roles. They define what actions those principals are allowed to perform on AWS resources, including S3 buckets.
Example Bucket Policy (Granting public read access to a specific object):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-public-website-assets/index.html"
}
]
}
This policy allows anyone ("Principal": "*") to perform the s3:GetObject action on the index.html file within the my-public-website-assets bucket. Be extremely cautious with public access.
The mental model for S3 policies is "explicit deny overrides allow." If any policy (bucket or IAM) explicitly denies an action, it will be denied, even if another policy allows it.
The next logical step after understanding basic access control is to explore S3’s versioning and replication features for data resilience.