Integration with AWS API Gateway
AWS API Gateway can be configured to trust X.509 certificates presented by Defakto-enabled workloads. Defakto utilizes short-lived root certificates and automates their creation and deprecation within the Defakto components.
When configuring an external service such as API Gateway, the current valid set of trust anchors must be synchronized into the API Gateway configuration as new root certificates are introduced and old ones expire.
Defakto provides a tool called spirl-sync that automatically maintains synchronization between the AWS API Gateway mutual TLS trust store and the Defakto root certificates.
What is spirl-sync?β
spirl-sync is a lightweight tool that:
- Monitors your Defakto trust domain federation endpoint for certificate changes
- Automatically downloads new root certificates
- Updates your AWS API Gateway's trust store with the latest certificates
- Removes expired certificates from the AWS API Gateway's trust store
This ensures your API Gateway always trusts the current Defakto certificates without manual intervention.
Running under AWS Lambda using Terraformβ
This guide shows how to deploy spirl-sync as an AWS Lambda function using Terraform. The Lambda function will run periodically to keep your API Gateway's trust store synchronized with SPIRL's root certificates.
What you'll needβ
Before starting, ensure you have:
- An internal Amazon Elastic Container Registry (ECR) repository located in the same region as your planned AWS Lambda function
- An active Defakto trust domain
- Basic familiarity with Terraform and AWS Lambda
Step 1: Copy the container image to your ECR repositoryβ
First, copy the spirl-sync container image from Defakto's registry to your internal ECR repository:
Don't forget to replace the sample value of 11111111111.dkr.ecr.us-west-2 with your correct ECR repository hostname (your default private repository is AWS_ACCOUNT.dkr.ecr.REGION.amazonaws.com)
docker pull ghcr.io/spirl/spirl-sync:0.1.7 docker tag ghcr.io/spirl/spirl-sync:0.1.7 11111111111.dkr.ecr.us-west-2.amazonaws.com/spirl-sync:0.1.7 docker push 11111111111.dkr.ecr.us-west-2.amazonaws.com/spirl-sync:0.1.7
Step 2: Gather your Defakto configuration informationβ
You'll need to find the SPIFFE Bundle Endpoint for your trust domain. This endpoint provides the certificates that spirl-sync will synchronize to API Gateway.
Run this command to get your trust domain information:
spirlctl trust-domain info example.com
Example output:
spirlctl trust-domain info example.com
Getting Trust Domain Infoβ Ό
ID td-d3ornt0mnw
Name: example.com
Status: available
Self-Managed: false
SPIRL Agent Endpoint: td-d3ornt0mnw.agent.spirl.com:443
SPIFFE Bundle Endpoint: https://fed.spirl.org/t-su8rvkjgix/td-d3ornt0mnw/bundle
JWT Issuer: https://fed.spirl.org/t-su8rvkjgix/td-d3ornt0mnw
JWKS Endpoint: https://fed.spirl.org/t-su8rvkjgix/td-d3ornt0mnw/jwks
OIDC Discovery Endpoint: https://fed.spirl.org/t-su8rvkjgix/td-d3ornt0mnw/.well-known/openid-configuration
Created At: 2025-01-08 14:50:10.306 +0000 UTC
Last Updated At: 2025-04-07 22:06:45.711 +0000 UTC
From this output, copy the SPIFFE Bundle Endpoint URL (in this example: https://fed.spirl.org/t-su8rvkjgix/td-d3ornt0mnw/bundle) - you'll need this for the Lambda configuration.
Step 3: Create the Lambda function with Terraformβ
This Terraform configuration creates a Lambda function that runs spirl-sync to keep your API Gateway synchronized with SPIRL certificates.
Important values to customize:
- Replace
11111111111.dkr.ecr.us-west-2.amazonaws.com/spirl-sync:v0.0.0with your actual ECR repository details - Update the
BUNDLE_ENDPOINTSvalue with your SPIFFE Bundle Endpoint from Step 2 - Set
API_GATEWAY_IDto reference your API Gateway - Configure
S3_BUCKET_NAMEandDOMAIN_NAMEfor your environment
resource "aws_lambda_function" "spirl_sync" {
function_name = "spirl-sync"
role = aws_iam_role.spirl_sync_lambda_exec.arn
package_type = "Image"
# Replace this with the full path you uploaded in Step 1
image_uri = "11111111111.dkr.ecr.us-west-2.amazonaws.com/spirl-sync:v0.0.0"
# Specify arm64 architecture, amd64 can also be used
architectures = ["arm64"]
# Lambda functions using container images don't use the handler and runtime parameters
# as they are defined in the container
# Note: Initial configuration of an API Gateway Domain Name for mTLS can take more than 5 minutes
timeout = 600
memory_size = 128
# Environment variables configure how spirl-sync operates
environment {
variables = {
# SYNC_TARGET tells spirl-sync that we are configuring an API gateway
SYNC_TARGET = "apigateway"
# BUNDLE_ENDPOINTS is a comma-separated list of federation endpoints from Defakto.
# This is the list of source of the certificates and all certificates in the federation endpoint will be included.
# Replace this with your SPIFFE bundle endpoint from Step 2
BUNDLE_ENDPOINTS = "https://fed.spirl.org/t-aaaaaaaaaa/td-bbbbbbbbbb/bundle"
# API_GATEWAY_ID is the API gateway this Lambda function should be configuring
API_GATEWAY_ID = aws_apigatewayv2_api.api_gateway.id
# S3_BUCKET_NAME is the S3 bucket that will be used to save the trust store
S3_BUCKET_NAME = aws_s3_bucket.api_gateway_trust_store.bucket
# S3_BUNDLE_KEY is the path within the bucket to save the trust store
S3_BUNDLE_KEY = "bundle.pem"
# DOMAIN_NAME is the domain name configuration that will be attached to the API gateway
DOMAIN_NAME = var.gateway_domain
}
}
depends_on = [aws_cloudwatch_log_group.spirl_sync]
}
# CloudWatch Log Group for Lambda function
resource "aws_cloudwatch_log_group" "spirl_sync" {
name = "/aws/lambda/spirl-sync"
retention_in_days = 30
# Optional: Add tags as needed
tags = {
Service = "spirl-sync"
}
}
# Execution Role for the spirl_sync Lambda function
resource "aws_iam_role" "spirl_sync_lambda_exec" {
name = "spirl_sync_lambda_exec_role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
}
]
})
}
# Attach the AWSLambdaBasicExecutionRole to the Lambda function exec role
resource "aws_iam_role_policy_attachment" "spirl_sync_lambda_exec" {
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
role = aws_iam_role.spirl_sync_lambda_exec.name
}
# Set exec permissions on the Lambda function
# The Lambda needs permissions to:
# - Pull the spirl-sync container image from ECR
# - Access the S3 bucket to store the trust store needed by API Gateway
# - Update the API Gateway configuration with new certificate versions
resource "aws_iam_role_policy" "spirl_sync_lambda_exec" {
name = "spirl_sync_lambda_exec_ecr_access"
role = aws_iam_role.spirl_sync_lambda_exec.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability",
"ecr:GetAuthorizationToken"
]
# Replace with the ARN for the ECR repository holding the spirl-sync image
Resource = "arn:aws:ecr:us-west-2:11111111111:repository/spirl-sync"
},
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket",
"s3:GetObjectVersion",
]
Resource = [
# Replace with the S3 bucket arn / path that will hold the trust store
"${aws_s3_bucket.api_gateway_trust_store.arn}",
"${aws_s3_bucket.api_gateway_trust_store.arn}/*"
]
},
{
Effect = "Allow"
Action = [
"apigateway:GET",
"apigateway:PATCH",
"apigateway:PUT",
"apigateway:POST",
"apigateway:UPDATE",
"apigateway:AddCertificateToDomain",
"apigateway:RemoveCertificateFromDomain"
]
# Replace with the ARN of the domain name that will be doing the mTLS termination
Resource = "arn:aws:apigateway:us-west-2::/domainnames/example.dev.spirl.net"
}
]
})
}
Step 4: Test the Lambda functionβ
Once deployed, test your Lambda function to ensure it's working correctly:
- Open the AWS Lambda console
- Find your
spirl-syncfunction - Click the Test button
- No test parameters are needed - just click Test and monitor the execution logs
The function should complete successfully and you should see logs indicating that it synchronized certificates with your API Gateway.
Step 5: Set up automatic synchronizationβ
To keep your certificates current, configure the Lambda function to run automatically on a schedule. This example runs every 5 minutes, but you can adjust the frequency based on your security requirements.
# EventBridge rule to trigger the Lambda function on a schedule
resource "aws_cloudwatch_event_rule" "spirl_sync_schedule" {
name = "spirl-sync-schedule"
description = "Schedule for running the spirl-sync function"
# Runs every 5 minutes - adjust frequency as needed
schedule_expression = "rate(5 minutes)"
}
# EventBridge target that points to the Lambda function
resource "aws_cloudwatch_event_target" "spirl_sync_target" {
rule = aws_cloudwatch_event_rule.spirl_sync_schedule.name
target_id = "spirl_sync_lambda"
arn = aws_lambda_function.spirl_sync.arn
}
# Permission for EventBridge to invoke the Lambda function
resource "aws_lambda_permission" "allow_eventbridge" {
statement_id = "AllowExecutionFromEventBridge"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.spirl_sync.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.spirl_sync_schedule.arn
}
Step 6: Handle Terraform conflictsβ
When using Terraform to manage your API Gateway domain name, you need to prevent conflicts between Terraform and spirl-sync. Both tools will try to manage the same configuration, which will cause an outage when terraform removes the mTLS configuration.
The solution is to tell Terraform to ignore changes that spirl-sync makes to the mutual TLS configuration. spirl-sync only manages the mutual_tls_authentication portion of your domain configuration, so it's safe to ignore these changes in Terraform.
resource "aws_apigatewayv2_domain_name" "api_gateway" {
domain_name = "example.com"
domain_name_configuration {
certificate_arn = "arn://example/..."
endpoint_type = "REGIONAL"
security_policy = "TLS_1_2"
}
# This lifecycle rule prevents Terraform from trying to remove the mutual_tls_authentication
# configuration that spirl-sync adds. Without this, Terraform would remove the mTLS
# configuration on each run, causing service outages.
lifecycle {
ignore_changes = [ mutual_tls_authentication ]
}
}
Monitoringβ
When running spirl-sync on AWS Lambda, monitoring ensures your certificate synchronization remains healthy. AWS Lambda automatically publishes invocation metrics to CloudWatch, making it straightforward to track the health of your spirl-sync deployment.
Key metrics to monitorβ
AWS Lambda provides built-in metrics. Focus on these two for spirl-sync:
Invocationsβ
The Invocations metric tracks how many times your Lambda function is triggered. Since spirl-sync should run on a regular schedule, a lack of invocations indicates a problem with your EventBridge schedule or Lambda configuration.
What to monitor: Zero invocations (or missing data) over a period twice your schedule interval indicates the function isn't running.
Example alarm setup:
- Metric: Invocations
- Statistic: Sum
- Period: 10 minutes (for a 5-minute schedule)
- Threshold: Less than 1
- Datapoints to alarm: 1 out of 1
- Treat missing data as: Breaching
Errorsβ
The Errors metric counts failed Lambda invocations. spirl-sync will fail if it cannot download bundles, access S3, or update API Gateway. Any error means your API Gateway trust store may not be in sync with the trust domain servers.
What to monitor: Any errors indicate a problem that needs immediate attention.
Example alarm setup:
- Metric: Errors
- Statistic: Sum
- Period: 5 minutes
- Threshold: Greater than 0
- Datapoints to alarm: 2 out of 3 (tolerates transient failures)
The following Lambda metrics are not relevant for spirl-sync:
- Throttles - spirl-sync runs infrequently and won't hit Lambda concurrency limits
- Duration - execution time varies based on API Gateway update status; no fixed baseline
- ConcurrentExecutions - spirl-sync runs sequentially by design
- DestinationDeliveryFailures - spirl-sync doesn't use async destinations
- IteratorAge - only applies to stream-based invocations
Testing your alarmsβ
Before relying on your monitoring, verify that alarms trigger correctly. The easiest way is to intentionally misconfigure the Lambda function.
Test 1: Invalid domain nameβ
Update your Lambda function's DOMAIN_NAME environment variable to an invalid value:
# Temporarily modify for testing
environment {
variables = {
DOMAIN_NAME = "invalid-domain.example.com.not" # Add .not to break it
# ... other variables
}
}
Apply the change and wait for the next scheduled invocation. You should see an error like:
{
"errorMessage": "failed to get current APIGateway Domain Name Configuration: getting domain name configuration: operation error API Gateway: GetDomainName, https response error StatusCode: 403, RequestID: b7d87075-f2ea-45aa-90fe-5973e9228914, api error AccessDeniedException: User: arn:aws:sts::438253597286:assumed-role/spirl_sync_lambda_role/spirl-sync-function is not authorized to perform: apigateway:GET on resource: arn:aws:apigateway:us-west-2::/domainnames/invalid-domain.example.com.not because no identity-based policy allows the apigateway:GET action",
"errorType": "wrapError"
}
Your error alarm should trigger after 2 failed invocations.
Test 2: Invalid bundle endpointβ
Update the BUNDLE_ENDPOINTS variable to an invalid URL:
environment {
variables = {
BUNDLE_ENDPOINTS = "https://fed.spir.orgt-broken/td-test/bundle" # Typo in domain
# ... other variables
}
}
This will cause a DNS resolution failure:
{
"errorMessage": "failed to download and combine bundles: downloading bundle from https://fed.spir.orgt-broken/td-test/bundle: fetching bundle from https://fed.spir.orgt-broken/td-test/bundle: Get \"https://fed.spir.orgt-broken/td-test/bundle\": dial tcp: lookup fed.spir.orgt-broken on 169.254.78.1:53: no such host",
"errorType": "wrapError"
}
Test 3: No invocations (schedule disabled)β
Temporarily disable the EventBridge rule to test the invocations alarm:
resource "aws_cloudwatch_event_rule" "spirl_sync_schedule" {
name = "spirl-sync-schedule"
description = "Schedule for running the spirl-sync function"
schedule_expression = "rate(5 minutes)"
is_enabled = false # Disable temporarily
}
Wait for twice your schedule period (10 minutes for a 5-minute schedule), and your no-invocations alarm should trigger.
Troubleshootingβ
When spirl-sync encounters issues, start by checking the Lambda function's metrics, then dive into logs for detailed diagnostics.
Check the monitoring dashboard firstβ
To check spirl-sync health:
- Open the AWS Lambda Console
- Select your
spirl-syncfunction - Click the Monitor tab
On the monitoring dashboard, review:
- Error count and success rate (%) - Any errors indicate a problem that needs attention. A success rate below 100% means some invocations are failing.
- Invocations - Verify the function is being triggered on schedule. If you see gaps or missing invocations, check your EventBridge rule.
If you see errors or unexpected behavior, proceed to the logs for detailed information.
Viewing detailed logsβ
To access CloudWatch logs:
- From the Monitor tab in Lambda, click View CloudWatch logs
- Select the most recent log stream to see the latest execution
Normal execution flowβ
Each Lambda invocation logs the complete synchronization process. Here's what spirl-sync does during a normal run:
- Downloads trust bundles - Fetches certificates from each configured federation endpoint
- Combines bundles - If multiple endpoints are configured, merges them into a single bundle
- Converts to PEM - Transforms SPIFFE bundle format to PEM for API Gateway compatibility
- Uploads to S3 - Writes the bundle to S3 (only if changed since last run)
- Updates API Gateway - Patches the domain configuration with the new trust store version
- Monitors deployment - Polls until API Gateway completes the update (status changes from
UPDATINGtoAVAILABLE)
If any step fails, the Lambda function exits with an error, which appears in both the metrics dashboard and the logs.
Example: Successful executionβ
Here's what a normal, successful execution looks like:
START RequestId: 51ace3ea-c43f-4491-baf8-dfc671feeb85 Version: $LATEST
{"level":"info","ts":1766181437.3107746,"msg":"Processing Lambda invocation for target","target":"apigateway"}
{"level":"info","ts":1766181437.4188654,"msg":"Downloading SPIFFE bundles..."}
{"level":"info","ts":1766181437.8972862,"msg":"Fetching bundle","endpoint":"https://fed.spirl.org/t-e4venep2vy/td-cupczh3ejs/bundle"}
{"level":"info","ts":1766181438.0084076,"msg":"Added certificate from endpoint","endpoint":"https://fed.spirl.org/t-e4venep2vy/td-cupczh3ejs/bundle","signatureAlgorithm":"SHA256-RSA","commonName":"ks_337QvAjIeHBqhpXvE3GSfhOFK99","serialNumber":"98962942105801510547973235236374795840","notAfter":"2025-12-22 22:17:38 +0000 UTC","notBefore":"2025-09-23 22:07:38 +0000 UTC","uris":["spiffe://yannick-test"],"subjectKeyID":"7f:3e:ee:11:2c:18:dd:f3:74:fb:47:84:19:a4:9d:5a:c0:c4:f8:ee"}
{"level":"info","ts":1766181438.008443,"msg":"Added certificate from endpoint","endpoint":"https://fed.spirl.org/t-e4venep2vy/td-cupczh3ejs/bundle","signatureAlgorithm":"SHA256-RSA","commonName":"ks_36a6dH3uswGrR2riKli02YbQPvv","serialNumber":"273327091908013149463677894769291218112","notAfter":"2026-03-08 22:20:20 +0000 UTC","notBefore":"2025-12-08 22:10:20 +0000 UTC","uris":["spiffe://yannick-test"],"subjectKeyID":"b8:d5:e0:2d:67:ec:1b:dc:59:6f:8d:7d:9a:e2:3e:45:de:b0:10:e2"}
{"level":"info","ts":1766181438.0084643,"msg":"Combined certificates into a PEM bundle","count":2}
{"level":"info","ts":1766181438.170083,"msg":"New bundle uploaded to S3","bucket":"spirl-demo-gateway-trust-438253597286-us-west-2","key":"bundle.pem","versionID":"6smL3j44n3sNt1jnncigCU7HSLjiVzF5"}
{"level":"info","ts":1766181438.3680203,"msg":"Updating API Gateway to use new bundle version...","versionID":"6smL3j44n3sNt1jnncigCU7HSLjiVzF5","bucket":"spirl-demo-gateway-trust-438253597286-us-west-2","key":"bundle.pem","domainName":"sandbox.dev.spirl.net"}
{"level":"info","ts":1766181438.368052,"msg":"Updating specified domain name with new truststore","domainName":"sandbox.dev.spirl.net"}
{"level":"info","ts":1766181438.822999,"msg":"Successfully sent patch to API Gateway","domainName":"sandbox.dev.spirl.net","bucketName":"spirl-demo-gateway-trust-438253597286-us-west-2","key":"bundle.pem","versionID":"6smL3j44n3sNt1jnncigCU7HSLjiVzF5"}
{"level":"info","ts":1766181438.8230302,"msg":"Monitoring API Gateway deployment..."}
{"level":"info","ts":1766181438.8782198,"msg":"Domain name isn't available yet, waiting...","domainName":"sandbox.dev.spirl.net","status":"UPDATING","pollInterval":"5s"}
{"level":"info","ts":1766181443.9344957,"msg":"Domain name isn't available yet, waiting...","domainName":"sandbox.dev.spirl.net","status":"UPDATING","pollInterval":"5s"}
{"level":"info","ts":1766181448.990368,"msg":"Domain name isn't available yet, waiting...","domainName":"sandbox.dev.spirl.net","status":"UPDATING","pollInterval":"5s"}
{"level":"info","ts":1766181454.0205457,"msg":"Domain name is available and fully deployed","domainName":"sandbox.dev.spirl.net"}
{"level":"info","ts":1766181454.0205724,"msg":"Successfully updated API Gateway with SPIFFE bundle!"}
END RequestId: 51ace3ea-c43f-4491-baf8-dfc671feeb85
REPORT RequestId: 51ace3ea-c43f-4491-baf8-dfc671feeb85 Duration: 16712.25 ms Billed Duration: 20513 ms Memory Size: 512 MB Max Memory Used: 53 MB Init Duration: 3799.83 ms
Key indicators of success:
- Each bundle endpoint is fetched successfully
- Certificates are added with valid
notAfterdates in the future - Bundle is uploaded to S3 with a new
versionID(Only if required) - API Gateway transitions from
UPDATINGtoAVAILABLE - Final message:
"Successfully updated API Gateway with SPIFFE bundle!"
If bundles haven't changed since the last run, you won't see "New bundle uploaded to S3". This is normal spirl-sync only updates S3 when certificates change.
Common errors and solutionsβ
Error: Cannot access bundle endpointβ
{
"errorMessage": "failed to download and combine bundles: downloading bundle from https://fed.spirl.org/t-example/td-test/bundle: fetching bundle: Get \"https://fed.spirl.org/t-example/td-test/bundle\": dial tcp: lookup fed.spirl.org on 169.254.78.1:53: no such host",
"errorType": "wrapError"
}
Cause: Network connectivity issue or incorrect bundle endpoint URL
Solutions:
- Verify your
BUNDLE_ENDPOINTSenvironment variable is correct (check for typos) - Ensure your Lambda has internet access (via NAT Gateway if in a VPC)
- Check that VPC security groups allow outbound HTTPS traffic
- Verify the federation endpoint URL with
spirlctl trust-domain info <domain>
Error: API Gateway access deniedβ
{
"errorMessage": "failed to get current APIGateway Domain Name Configuration: getting domain name configuration: operation error API Gateway: GetDomainName, https response error StatusCode: 403, api error AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/spirl_sync_lambda_role/spirl-sync is not authorized to perform: apigateway:GET on resource: arn:aws:apigateway:us-west-2::/domainnames/example.com",
"errorType": "wrapError"
}
Cause: Lambda execution role lacks necessary API Gateway permissions
Solutions:
- Verify the IAM policy attached to your Lambda role includes
apigateway:GET - Verify the domain name configured in lambda matches the domain name configured in API Gateway
- Check the
ResourceARN in the policy matches your domain name exactly - Ensure you've applied the Terraform changes after updating IAM policies
- Review the example IAM policy in Step 3 of the setup guide
Error: S3 access deniedβ
{
"errorMessage": "failed to upload bundle to S3: operation error S3: PutObject, https response error StatusCode: 403, AccessDenied: Access Denied",
"errorType": "wrapError"
}
Cause: Lambda execution role cannot write to the S3 bucket
Solutions:
- Verify the Lambda role has
s3:PutObjectpermission for the bucket - Check S3 bucket policy doesn't explicitly deny the Lambda role
- Ensure the bucket name in
S3_BUCKET_NAMEis correct - Verify the S3 bucket exists in the same AWS account
Error: Lambda timeoutβ
Task timed out after 600.00 seconds
Cause: API Gateway update is taking longer than the Lambda timeout (usually during initial setup)
Solutions:
- Increase Lambda timeout to 900 seconds (15 minutes) for initial configuration
- Initial domain name mTLS configuration can take 5-10 minutes
- Once established, subsequent runs complete in seconds
- Check API Gateway domain status in AWS Console for manual verification