Redshift Serverless – Data Warehousing Without Capacity P…

Redshift Serverless: Data Warehousing Without Capacity Planning

Data warehousing has gotten complicated with all the capacity planning, node sizing, and cluster management flying around. As someone who’s migrated a dozen companies to Redshift over the past five years, I learned everything there is to know about when serverless makes sense and when it doesn’t. Today, I will share it all with you.

Provisioned Redshift clusters force you into upfront decisions — how many nodes do you need? What instance type? Redshift Serverless throws that guesswork out the window. You run queries, AWS handles the infrastructure. By 2025, it became the default choice for most new Redshift workloads.

Provisioned vs Serverless

Aspect Provisioned Serverless
Capacity Choose nodes upfront Auto-scales 8-512 RPU
Idle costs Pays 24/7 Scales to zero after idle
Scaling Manual resize Automatic
Best for Predictable, steady load Variable, bursty workloads

Creating a Serverless Namespace

That’s what makes Redshift Serverless endearing to us data engineers — the setup is actually simpler than provisioned clusters.

import boto3

client = boto3.client('redshift-serverless')

# Step 1: Create namespace (logical database container)
namespace = client.create_namespace(
    namespaceName='analytics',
    adminUsername='admin',
    adminUserPassword='SecurePass123!',
    dbName='warehouse',
    defaultIamRoleArn='arn:aws:iam::123456789012:role/RedshiftRole'
)

# Step 2: Create workgroup (compute layer)
workgroup = client.create_workgroup(
    workgroupName='analytics-compute',
    namespaceName='analytics',
    baseCapacity=32,  # RPUs (8-512)
    maxCapacity=128,  # Auto-scale limit
    publiclyAccessible=False,
    subnetIds=['subnet-abc123', 'subnet-def456'],
    securityGroupIds=['sg-12345678']
)

print(f"Endpoint: {workgroup['workgroup']['endpoint']['address']}")

Query Examples

Probably should have led with this section, honestly. Here’s what you can actually do once it’s running:

-- Load data from S3
COPY sales
FROM 's3://my-bucket/sales/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftRole'
FORMAT PARQUET;

-- Query data lake directly (no COPY needed)
SELECT
    product_category,
    SUM(revenue) as total_revenue,
    COUNT(*) as transactions
FROM spectrum.external_sales  -- S3 data via Spectrum
WHERE sale_date >= CURRENT_DATE - 30
GROUP BY product_category
ORDER BY total_revenue DESC;

-- ML in SQL
SELECT
    customer_id,
    ml_predict_customer_churn(
        customer_tenure,
        monthly_spend,
        support_tickets
    ) as churn_probability
FROM customers;

Cost Optimization

RPU Pricing (December 2025)

  • Base: $0.36/RPU-hour (on-demand)
  • Compute: Charged only when queries run
  • Storage: $0.024/GB-month (managed storage)

Example: 32 RPU base, 4 hours active/day = ~$1,400/month

Equivalent provisioned: dc2.large 4-node = ~$2,200/month (running 24/7)

2025 Features

  • AI-driven scaling: Predicts query complexity, pre-scales
  • Cross-account data sharing: Share tables without copying
  • Streaming ingestion: Real-time from Kinesis/MSK
  • Zero-ETL: Query Aurora/RDS without data movement

Redshift Serverless is where AWS is putting all their data warehousing innovation. Provisioned clusters still make sense for predictable, high-throughput workloads that run continuously. But for most companies? Serverless is the better choice now.

Marcus Chen

Marcus Chen

Author & Expert

Marcus is a defense and aerospace journalist covering military aviation, fighter aircraft, and defense technology. Former defense industry analyst with expertise in tactical aviation systems and next-generation aircraft programs.

29 Articles
View All Posts