Redshift Serverless: Data Warehousing Without Capacity Planning
Data warehousing has gotten complicated with all the capacity planning, node sizing, and cluster management flying around. As someone who’s migrated a dozen companies to Redshift over the past five years, I learned everything there is to know about when serverless makes sense and when it doesn’t. Today, I will share it all with you.
Provisioned Redshift clusters force you into upfront decisions — how many nodes do you need? What instance type? Redshift Serverless throws that guesswork out the window. You run queries, AWS handles the infrastructure. By 2025, it became the default choice for most new Redshift workloads.
Provisioned vs Serverless
| Aspect | Provisioned | Serverless |
|---|---|---|
| Capacity | Choose nodes upfront | Auto-scales 8-512 RPU |
| Idle costs | Pays 24/7 | Scales to zero after idle |
| Scaling | Manual resize | Automatic |
| Best for | Predictable, steady load | Variable, bursty workloads |
Creating a Serverless Namespace
That’s what makes Redshift Serverless endearing to us data engineers — the setup is actually simpler than provisioned clusters.
import boto3
client = boto3.client('redshift-serverless')
# Step 1: Create namespace (logical database container)
namespace = client.create_namespace(
namespaceName='analytics',
adminUsername='admin',
adminUserPassword='SecurePass123!',
dbName='warehouse',
defaultIamRoleArn='arn:aws:iam::123456789012:role/RedshiftRole'
)
# Step 2: Create workgroup (compute layer)
workgroup = client.create_workgroup(
workgroupName='analytics-compute',
namespaceName='analytics',
baseCapacity=32, # RPUs (8-512)
maxCapacity=128, # Auto-scale limit
publiclyAccessible=False,
subnetIds=['subnet-abc123', 'subnet-def456'],
securityGroupIds=['sg-12345678']
)
print(f"Endpoint: {workgroup['workgroup']['endpoint']['address']}")
Query Examples
Probably should have led with this section, honestly. Here’s what you can actually do once it’s running:
-- Load data from S3
COPY sales
FROM 's3://my-bucket/sales/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftRole'
FORMAT PARQUET;
-- Query data lake directly (no COPY needed)
SELECT
product_category,
SUM(revenue) as total_revenue,
COUNT(*) as transactions
FROM spectrum.external_sales -- S3 data via Spectrum
WHERE sale_date >= CURRENT_DATE - 30
GROUP BY product_category
ORDER BY total_revenue DESC;
-- ML in SQL
SELECT
customer_id,
ml_predict_customer_churn(
customer_tenure,
monthly_spend,
support_tickets
) as churn_probability
FROM customers;
Cost Optimization
RPU Pricing (December 2025)
- Base: $0.36/RPU-hour (on-demand)
- Compute: Charged only when queries run
- Storage: $0.024/GB-month (managed storage)
Example: 32 RPU base, 4 hours active/day = ~$1,400/month
Equivalent provisioned: dc2.large 4-node = ~$2,200/month (running 24/7)
2025 Features
- AI-driven scaling: Predicts query complexity, pre-scales
- Cross-account data sharing: Share tables without copying
- Streaming ingestion: Real-time from Kinesis/MSK
- Zero-ETL: Query Aurora/RDS without data movement
Redshift Serverless is where AWS is putting all their data warehousing innovation. Provisioned clusters still make sense for predictable, high-throughput workloads that run continuously. But for most companies? Serverless is the better choice now.