Bedrock Guardrails: Preventing AI Hallucinations in Production
AI hallucinations cost companies money and trust. Amazon Bedrock Guardrails provides enterprise-grade controls to filter harmful content, enforce topic restrictions, and ground responses in facts. Here’s how to implement them.
Guardrails Capabilities
| Feature | What It Does |
|---|---|
| Content Filters | Block hate, violence, sexual content, profanity |
| Topic Blocking | Deny discussions on specified subjects |
| PII Redaction | Mask SSN, credit cards, phone numbers |
| Word Filters | Block competitor names, internal terms |
| Grounding | Verify responses against source documents |
Creating a Guardrail
import boto3
bedrock = boto3.client('bedrock')
# Create guardrail
guardrail = bedrock.create_guardrail(
name='customer-support-guardrail',
description='Guardrails for customer-facing chatbot',
# Content filtering
contentPolicyConfig={
'filtersConfig': [
{'type': 'HATE', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
{'type': 'VIOLENCE', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
{'type': 'SEXUAL', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
]
},
# Topic restrictions
topicPolicyConfig={
'topicsConfig': [
{
'name': 'Competitor Discussion',
'definition': 'Any mention of competitor products or pricing',
'type': 'DENY'
},
{
'name': 'Financial Advice',
'definition': 'Investment or financial planning recommendations',
'type': 'DENY'
}
]
},
# PII handling
sensitiveInformationPolicyConfig={
'piiEntitiesConfig': [
{'type': 'US_SOCIAL_SECURITY_NUMBER', 'action': 'ANONYMIZE'},
{'type': 'CREDIT_DEBIT_CARD_NUMBER', 'action': 'BLOCK'},
]
},
blockedInputMessaging='I cannot help with that request.',
blockedOutputsMessaging='I cannot provide that information.'
)
Using Guardrails with Invoke
runtime = boto3.client('bedrock-runtime')
response = runtime.invoke_model(
modelId='amazon.nova-pro-v1:0',
guardrailIdentifier='my-guardrail-id',
guardrailVersion='DRAFT',
body=json.dumps({
'inputText': user_question
})
)
# Check if guardrail intervened
if response.get('guardrailAction') == 'INTERVENED':
print("Guardrail blocked this request")
Grounding Checks (Anti-Hallucination)
# Enable grounding to verify facts
response = runtime.apply_guardrail(
guardrailIdentifier='my-guardrail',
guardrailVersion='1',
source='OUTPUT',
content=[{
'text': {
'text': model_response,
'qualifiers': ['grounding_source']
}
}],
# Provide source documents for verification
groundingSource={
'text': source_documents
}
)
Best Practice
Always apply guardrails to both inputs AND outputs. Users can craft prompts to bypass input filters, so output filtering is your last line of defense.