Amazon S3: Complete Object Storage Guide

Amazon S3 (Simple Storage Service) is the backbone of AWS storage. It’s where your data lives—whether that’s application assets, backups, data lakes, or static website hosting. With 11 9’s of durability (99.999999999%), your data is safer in S3 than almost anywhere else.
S3 Storage Classes
Choose the right storage class based on access patterns and cost requirements:
| Storage Class | Use Case | Availability | Cost/GB |
|---|---|---|---|
| S3 Standard | Frequently accessed data | 99.99% | $0.023 |
| S3 Intelligent-Tiering | Unknown/changing access | 99.9% | $0.023 + monitoring |
| S3 Standard-IA | Infrequent access, quick retrieval | 99.9% | $0.0125 |
| S3 Glacier Instant | Archive with instant access | 99.9% | $0.004 |
| S3 Glacier Flexible | Archive, minutes-hours retrieval | 99.99% | $0.0036 |
| S3 Glacier Deep Archive | Long-term archive, 12+ hour retrieval | 99.99% | $0.00099 |
S3 Operations with Python
import boto3
from botocore.exceptions import ClientError
s3 = boto3.client('s3')
# Upload file
def upload_file(file_path, bucket, key):
try:
s3.upload_file(
file_path, bucket, key,
ExtraArgs={
'StorageClass': 'STANDARD_IA',
'ServerSideEncryption': 'AES256'
}
)
print(f"Uploaded {key}")
except ClientError as e:
print(f"Error: {e}")
# Download file
def download_file(bucket, key, local_path):
s3.download_file(bucket, key, local_path)
# Generate presigned URL (temporary access)
def get_presigned_url(bucket, key, expiration=3600):
url = s3.generate_presigned_url(
'get_object',
Params={'Bucket': bucket, 'Key': key},
ExpiresIn=expiration
)
return url
# List objects with pagination
def list_all_objects(bucket, prefix=''):
paginator = s3.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
for obj in page.get('Contents', []):
yield obj['Key']
S3 Bucket Policies
Control access with bucket policies:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowCloudFrontAccess",
"Effect": "Allow",
"Principal": {
"Service": "cloudfront.amazonaws.com"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"StringEquals": {
"AWS:SourceArn": "arn:aws:cloudfront::123456789:distribution/EDFDVBD6EXAMPLE"
}
}
},
{
"Sid": "DenyUnencryptedUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
}
]
}
S3 Lifecycle Rules
Automate data transitions between storage classes:
# Terraform lifecycle configuration
resource "aws_s3_bucket_lifecycle_configuration" "logs" {
bucket = aws_s3_bucket.logs.id
rule {
id = "log-lifecycle"
status = "Enabled"
filter {
prefix = "logs/"
}
transition {
days = 30
storage_class = "STANDARD_IA"
}
transition {
days = 90
storage_class = "GLACIER"
}
expiration {
days = 365
}
noncurrent_version_expiration {
noncurrent_days = 30
}
}
}
S3 Event Notifications
Trigger Lambda functions, SQS queues, or SNS topics on object events:
# CloudFormation: S3 event to Lambda
Resources:
ImageBucket:
Type: AWS::S3::Bucket
Properties:
NotificationConfiguration:
LambdaConfigurations:
- Event: s3:ObjectCreated:*
Filter:
S3Key:
Rules:
- Name: suffix
Value: .jpg
Function: !GetAtt ImageProcessor.Arn
S3 Versioning and Replication
🔄 Versioning Benefits:
- Recover from accidental deletes
- Maintain object history
- Required for cross-region replication
- Works with MFA Delete for extra protection
Cross-Region Replication
# Enable replication via AWS CLI
aws s3api put-bucket-replication \
--bucket source-bucket \
--replication-configuration '{
"Role": "arn:aws:iam::123456789:role/S3ReplicationRole",
"Rules": [{
"Status": "Enabled",
"Priority": 1,
"DeleteMarkerReplication": {"Status": "Disabled"},
"Filter": {"Prefix": ""},
"Destination": {
"Bucket": "arn:aws:s3:::destination-bucket",
"StorageClass": "STANDARD_IA"
}
}]
}'
S3 Select: Query Data In-Place
Query CSV, JSON, or Parquet files directly in S3 without downloading:
import boto3
s3 = boto3.client('s3')
response = s3.select_object_content(
Bucket='my-bucket',
Key='data/sales.csv',
ExpressionType='SQL',
Expression="""
SELECT s.product, s.quantity, s.price
FROM s3object s
WHERE s.quantity > 100
""",
InputSerialization={'CSV': {'FileHeaderInfo': 'USE'}},
OutputSerialization={'JSON': {}}
)
for event in response['Payload']:
if 'Records' in event:
print(event['Records']['Payload'].decode())
Security Best Practices
- ✅ Block public access by default (S3 Block Public Access)
- ✅ Enable server-side encryption (SSE-S3 or SSE-KMS)
- ✅ Use bucket policies, not ACLs (ACLs are legacy)
- ✅ Enable access logging for audit trails
- ✅ Use VPC endpoints for private access
- ✅ Enable versioning and MFA Delete for critical buckets
Cost Optimization
- Use Intelligent-Tiering: Automatic cost savings for unpredictable access
- Lifecycle rules: Move old data to cheaper storage automatically
- S3 Inventory: Analyze storage usage patterns
- Requester Pays: Shift transfer costs to requesters
- S3 Storage Lens: Get organization-wide storage insights
Further Reading
🎯 Pro Tip: Use S3 Intelligent-Tiering for data with unknown access patterns. It automatically moves objects between access tiers with no retrieval fees or operational overhead.