Amazon S3: Complete Object Storage and Data Management Guide

Amazon S3: Complete Object Storage Guide

Software developer at work
Software developer at work

Amazon S3 (Simple Storage Service) is the backbone of AWS storage. It’s where your data lives—whether that’s application assets, backups, data lakes, or static website hosting. With 11 9’s of durability (99.999999999%), your data is safer in S3 than almost anywhere else.

S3 Storage Classes

Choose the right storage class based on access patterns and cost requirements:

Storage Class Use Case Availability Cost/GB
S3 Standard Frequently accessed data 99.99% $0.023
S3 Intelligent-Tiering Unknown/changing access 99.9% $0.023 + monitoring
S3 Standard-IA Infrequent access, quick retrieval 99.9% $0.0125
S3 Glacier Instant Archive with instant access 99.9% $0.004
S3 Glacier Flexible Archive, minutes-hours retrieval 99.99% $0.0036
S3 Glacier Deep Archive Long-term archive, 12+ hour retrieval 99.99% $0.00099

S3 Operations with Python

import boto3
from botocore.exceptions import ClientError

s3 = boto3.client('s3')

# Upload file
def upload_file(file_path, bucket, key):
    try:
        s3.upload_file(
            file_path, bucket, key,
            ExtraArgs={
                'StorageClass': 'STANDARD_IA',
                'ServerSideEncryption': 'AES256'
            }
        )
        print(f"Uploaded {key}")
    except ClientError as e:
        print(f"Error: {e}")

# Download file
def download_file(bucket, key, local_path):
    s3.download_file(bucket, key, local_path)

# Generate presigned URL (temporary access)
def get_presigned_url(bucket, key, expiration=3600):
    url = s3.generate_presigned_url(
        'get_object',
        Params={'Bucket': bucket, 'Key': key},
        ExpiresIn=expiration
    )
    return url

# List objects with pagination
def list_all_objects(bucket, prefix=''):
    paginator = s3.get_paginator('list_objects_v2')
    for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
        for obj in page.get('Contents', []):
            yield obj['Key']

S3 Bucket Policies

Control access with bucket policies:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowCloudFrontAccess",
            "Effect": "Allow",
            "Principal": {
                "Service": "cloudfront.amazonaws.com"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::my-bucket/*",
            "Condition": {
                "StringEquals": {
                    "AWS:SourceArn": "arn:aws:cloudfront::123456789:distribution/EDFDVBD6EXAMPLE"
                }
            }
        },
        {
            "Sid": "DenyUnencryptedUploads",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::my-bucket/*",
            "Condition": {
                "StringNotEquals": {
                    "s3:x-amz-server-side-encryption": "AES256"
                }
            }
        }
    ]
}

S3 Lifecycle Rules

Automate data transitions between storage classes:

# Terraform lifecycle configuration
resource "aws_s3_bucket_lifecycle_configuration" "logs" {
  bucket = aws_s3_bucket.logs.id

  rule {
    id     = "log-lifecycle"
    status = "Enabled"

    filter {
      prefix = "logs/"
    }

    transition {
      days          = 30
      storage_class = "STANDARD_IA"
    }

    transition {
      days          = 90
      storage_class = "GLACIER"
    }

    expiration {
      days = 365
    }

    noncurrent_version_expiration {
      noncurrent_days = 30
    }
  }
}

S3 Event Notifications

Trigger Lambda functions, SQS queues, or SNS topics on object events:

# CloudFormation: S3 event to Lambda
Resources:
  ImageBucket:
    Type: AWS::S3::Bucket
    Properties:
      NotificationConfiguration:
        LambdaConfigurations:
          - Event: s3:ObjectCreated:*
            Filter:
              S3Key:
                Rules:
                  - Name: suffix
                    Value: .jpg
            Function: !GetAtt ImageProcessor.Arn

S3 Versioning and Replication

🔄 Versioning Benefits:

  • Recover from accidental deletes
  • Maintain object history
  • Required for cross-region replication
  • Works with MFA Delete for extra protection

Cross-Region Replication

# Enable replication via AWS CLI
aws s3api put-bucket-replication \
  --bucket source-bucket \
  --replication-configuration '{
    "Role": "arn:aws:iam::123456789:role/S3ReplicationRole",
    "Rules": [{
      "Status": "Enabled",
      "Priority": 1,
      "DeleteMarkerReplication": {"Status": "Disabled"},
      "Filter": {"Prefix": ""},
      "Destination": {
        "Bucket": "arn:aws:s3:::destination-bucket",
        "StorageClass": "STANDARD_IA"
      }
    }]
  }'

S3 Select: Query Data In-Place

Query CSV, JSON, or Parquet files directly in S3 without downloading:

import boto3

s3 = boto3.client('s3')

response = s3.select_object_content(
    Bucket='my-bucket',
    Key='data/sales.csv',
    ExpressionType='SQL',
    Expression="""
        SELECT s.product, s.quantity, s.price
        FROM s3object s
        WHERE s.quantity > 100
    """,
    InputSerialization={'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization={'JSON': {}}
)

for event in response['Payload']:
    if 'Records' in event:
        print(event['Records']['Payload'].decode())

Security Best Practices

  • ✅ Block public access by default (S3 Block Public Access)
  • ✅ Enable server-side encryption (SSE-S3 or SSE-KMS)
  • ✅ Use bucket policies, not ACLs (ACLs are legacy)
  • ✅ Enable access logging for audit trails
  • ✅ Use VPC endpoints for private access
  • ✅ Enable versioning and MFA Delete for critical buckets

Cost Optimization

  • Use Intelligent-Tiering: Automatic cost savings for unpredictable access
  • Lifecycle rules: Move old data to cheaper storage automatically
  • S3 Inventory: Analyze storage usage patterns
  • Requester Pays: Shift transfer costs to requesters
  • S3 Storage Lens: Get organization-wide storage insights

Further Reading

🎯 Pro Tip: Use S3 Intelligent-Tiering for data with unknown access patterns. It automatically moves objects between access tiers with no retrieval fees or operational overhead.

Jennifer Walsh

Jennifer Walsh

Author & Expert

Senior Cloud Solutions Architect with 12 years of experience in AWS, Azure, and GCP. Jennifer has led enterprise migrations for Fortune 500 companies and holds AWS Solutions Architect Professional and DevOps Engineer certifications. She specializes in serverless architectures, container orchestration, and cloud cost optimization. Previously a senior engineer at AWS Professional Services.

156 Articles
View All Posts