Using AWS ALB with Defakto Server
This guide shows how to expose your Defakto Trust Domain Servers running on Amazon EKS to agents using an AWS Application Load Balancer (ALB). The ALB provides TLS termination, health checking, and load distribution across your server pods.
You can configure the ALB using kubectl, Terraform, or Helm depending on your infrastructure management approach.
This guide assumes you have already deployed Defakto Trust Domain Servers. If not, see Deploy Defakto Trust Domain Servers first.
Prerequisitesβ
Before configuring the ALB, ensure you have:
- EKS cluster with worker nodes where Defakto servers will run
- AWS Load Balancer Controller installed in your cluster to manage ALB resources (installation guide)
- ACM certificate for your agent endpoint domain (e.g.,
agents.acm.example.com) to enable HTTPS - Active Defakto trust domain and deployment created via spirlctl
- spirlctl CLI installed and authenticated to retrieve deployment information
Configurationβ
Choose your deployment method:
- Manual (kubectl)
- Helm Chart
- Terraform
Create a Kubernetes Ingress resource using kubectl.
The Defakto server helm chart supports creating the Ingress as part of the installation. This example is included for reference if not using the Ingress resource from the helm chart.
Replace YOUR_TD_DEPLOYMENT_ID with your deployment ID from spirlctl trust-domain deployment list.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: spirl-server-agent
namespace: YOUR_TD_DEPLOYMENT_ID # e.g., tdd-abc123xyz
annotations:
# Load balancer type and exposure
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
# gRPC protocol configuration
alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/backend-protocol-version: GRPC
# TLS configuration
# (optional) if ACM certificate is pre-provisioned, point to the specific ARN. Otherwise allow the lb-controller to create the certificate
# alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:REGION:ACCOUNT:certificate/CERT_ID
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-3-PQ-2025-09
# Health check configuration for gRPC
alb.ingress.kubernetes.io/healthcheck-path: /grpc.health.v1.Health/Check
alb.ingress.kubernetes.io/success-codes: "0"
# Load balancing and timeout settings
alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=4000
spec:
ingressClassName: alb
rules:
- host: agents.acm.example.com # Your agent endpoint domain name
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: YOUR_TD_DEPLOYMENT_ID-spirl-server-agent
port:
number: 80
Apply the configuration:
kubectl apply -f spirl-server-ingress.yaml
Configure ALB ingress using Helm values:
trustDomainDeployment:
# Get these values from spirlctl commands
trustDomainName: "acm.example.com"
trustDomainID: "td-abc123xyz"
id: "tdd-def456uvw"
name: "production"
# Service configuration
service:
type: ClusterIP
port: 80
# ALB Ingress configuration
ingress:
enabled: true
className: alb
annotations:
# Load balancer type and exposure
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
# gRPC protocol configuration
alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/backend-protocol-version: GRPC
# TLS configuration - (optional) uncomment and replace with your certificate ARN
# alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/abcd-1234
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-3-PQ-2025-09
# Health check configuration for gRPC
alb.ingress.kubernetes.io/healthcheck-path: /grpc.health.v1.Health/Check
alb.ingress.kubernetes.io/success-codes: "0"
# Load balancing and timeout settings
alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=4000
hosts:
- host: agents.acm.example.com # Your agent endpoint domain name
paths:
- path: /
pathType: Prefix
controlPlane:
auth:
key:
id: "tdk-xyz789abc" # From spirlctl trust-domain key create
pem: |
-----BEGIN PRIVATE KEY-----
YOUR_PRIVATE_KEY_HERE
-----END PRIVATE KEY-----
telemetry:
enabled: true
collectors:
grpc:
emitLatencyMetrics: true
metricsAPI:
port: 9090
Install or upgrade the Defakto server:
helm upgrade --install YOUR_TD_DEPLOYMENT_ID \
oci://ghcr.io/spirl/charts/spirl-server \
--namespace YOUR_TD_DEPLOYMENT_ID \
--create-namespace \
--values spirl-server-values.yaml
Deploy Defakto Server with ALB ingress using Terraform:
# ============================================================================
# Register the Defakto Server and Deployment
# ============================================================================
resource "spirl_trust_domain" "prod" {
domain_name = "acm.example.com"
description = "EKS ALB demo trust domain"
}
# Note: spirl_key_pair is not recommended without careful consideration.
# spirl_key_pair stores the private key is stored in terraform state files.
resource "spirl_key_pair" "deployment" {
algorithm = "ed25519"
}
resource "spirl_trust_domain_deployment" "deployment" {
trust_domain_id = spirl_trust_domain.prod.id
name = "eks-alb-demo"
keys = {
"key1" = {
public_key = spirl_key_pair.deployment.public_key_pem
active = true
}
}
}
# ============================================================================
# Defakto Server Helm Release with ALB Ingress
# ============================================================================
resource "helm_release" "spirl_server" {
name = spirl_trust_domain_deployment.deployment.id
repository = "oci://ghcr.io/spirl/charts"
chart = "spirl-server"
namespace = spirl_trust_domain_deployment.deployment.id
create_namespace = true
wait = false
values = [
yamlencode({
trustDomainDeployment = {
trustDomainName = spirl_trust_domain.prod.domain_name
trustDomainID = spirl_trust_domain.prod.id
id = spirl_trust_domain_deployment.deployment.id
name = spirl_trust_domain_deployment.deployment.name
deployment = {
replicaCount = 2
}
serviceAccount = {
create = true
}
service = {
type = "ClusterIP"
port = 80
}
# ALB Ingress Configuration
ingress = {
enabled = true
className = "alb"
annotations = {
# Load balancer type and exposure
"alb.ingress.kubernetes.io/scheme" = "internet-facing"
"alb.ingress.kubernetes.io/target-type" = "ip"
# gRPC protocol configuration
"alb.ingress.kubernetes.io/backend-protocol" = "HTTP"
"alb.ingress.kubernetes.io/backend-protocol-version" = "GRPC"
# TLS configuration - (optional) uncomment and add your certificate ARN
# "alb.ingress.kubernetes.io/certificate-arn" = aws_acm_certificate.alb_cert.arn
"alb.ingress.kubernetes.io/listen-ports" = jsonencode([{ HTTPS = 443 }])
"alb.ingress.kubernetes.io/ssl-policy" = "ELBSecurityPolicy-TLS13-1-3-PQ-2025-09"
# Health check configuration for gRPC
"alb.ingress.kubernetes.io/success-codes" = "0"
"alb.ingress.kubernetes.io/healthcheck-path" = "/grpc.health.v1.Health/Check"
# Load balancing and timeout settings
"alb.ingress.kubernetes.io/load-balancer-attributes" = "idle_timeout.timeout_seconds=4000"
}
hosts = [
{
host = "agents.acm.example.com" # Your agent endpoint domain
paths = [
{
path = "/"
pathType = "Prefix"
}
]
}
]
}
}
controlPlane = {
auth = {
key = {
id = spirl_trust_domain_deployment.deployment.keys["key1"].id
pem = spirl_key_pair.deployment.private_key_pem
}
}
}
telemetry = {
enabled = true
collectors = {
grpc = {
emitLatencyMetrics = true
}
}
metricsAPI = {
port = 9090
}
}
})
]
}
Apply with Terraform:
terraform init
terraform plan
terraform apply
DNS Setupβ
After the ALB is created, you need to configure DNS so agents can reach your Defakto servers using a domain name (e.g., agents.acm.example.com).
Get the ALB address:
kubectl get ingress YOUR_TD_DEPLOYMENT_ID-spirl-server-agent -n YOUR_TD_DEPLOYMENT_ID
Look for the ADDRESS column in the output. This will be an AWS hostname like k8s-....elb.amazonaws.com.
Create a DNS record:
- Route 53: Create an A record with alias target pointing to the ALB
- Other DNS providers: Create a CNAME record pointing to the ALB address
The DNS record should map your agent endpoint domain (e.g., agents.acm.example.com) to the ALB address.
Verificationβ
After completing the configuration, verify that everything is working correctly.
Test connectivity with spirldbg:
Use spirldbg to verify agents can reach the Defakto server through the ALB:
spirldbg network-diagnostics --agent-endpoint agents.acm.example.com:443
Replace agents.acm.example.com with your actual agent endpoint domain. The command performs automatic checks including:
- Network connectivity to the endpoint
- TLS certificate validation
- Defakto server health checks
Verify timeout configuration:
Confirm the idle timeout is configured correctly in the AWS Console:
- Navigate to EC2 β Load Balancers
- Find your ALB (filter by the deployment ID)
- Click the Attributes tab
- Verify Idle timeout is set to 4000 seconds
Understanding the Timeout Settingβ
The 4000-second (66-minute) timeout is important for SPIRL's architecture.
How agents connect to the trust domain servers:
Agents maintain persistent HTTP/2 connections to Defakto servers for real-time updates. These connections use bidirectional gRPC streams to handle:
- Trust bundle rotation and federation configuration changes
- Agent configuration updates
The default timeout problem:
AWS ALB has a default idle timeout of 60 seconds. When no data flows across a connection for 60 seconds, the ALB terminates it. While agents send HTTP/2 ping frames to keep the connection alive, ALB still considers the gRPC streams idle and resets them. This causes agents to constantly reconnect, increasing load on your servers.
If the timeout is too low, you may see errors like:
{
"level": "error",
"ts": 1773794743.4153817,
"logger": "agent.bundleRefresher",
"msg": "Scheduling bundle sync attempt",
"after": 0.688756554,
"attempt": 0,
"error": "receiving SyncTrustBundles response: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: INTERNAL_ERROR"
}
{
"level": "error",
"ts": 1773794743.414761,
"logger": "agent.sessionClient",
"msg": "finished call",
"spirl_server_version": "0.33.0",
"grpc.component": "client",
"grpc.service": "com.spirl.private.common.api.resource.v1.Source",
"grpc.method": "PullResources",
"grpc.start_time": "2026-03-18T00:44:13Z",
"grpc.code": "Internal",
"grpc.error": "rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR",
"grpc.time_ms": "90366.78"
}
The solution:
Setting idle_timeout.timeout_seconds=4000 (66 minutes) allows connections to remain open longer than the agent's default 30-minute maximum connection lifetime. This lets agents reconnect naturally to optimize for the closest region.
ALB Annotation Referenceβ
These annotations configure the ALB for Defakto servers. Copy these into your ingress annotations section.
Load Balancer Timeout (Required)β
alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=4000
Sets the 66-minute idle timeout for agent connections. This is the most important setting - see "Understanding the Timeout Setting" for details.
gRPC Protocol Support (Required)β
alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/backend-protocol-version: GRPC
Enables HTTP/2 and gRPC support on the ALB.
Health Checks (Required)β
alb.ingress.kubernetes.io/healthcheck-path: /grpc.health.v1.Health/Check
alb.ingress.kubernetes.io/success-codes: "0"
Configures gRPC health checks. Note that gRPC health checks return code 0 for success, unlike HTTP REST APIs which use 200.
Target Configuration (Required)β
alb.ingress.kubernetes.io/target-type: ip
TLS/SSL Policy (Recommended)β
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-3-PQ-2025-09
# Directs traffic to pod IPs in EKS. Required for proper routing to Kubernetes pods.
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
Load Balancer Accessibility (Optional)β
alb.ingress.kubernetes.io/scheme: internet-facing # or 'internal'
Controls whether the ALB is publicly accessible (internet-facing) or private network only (internal).
TLS Certificate (Optional)β
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:REGION:ACCOUNT:certificate/CERT_ID
Provide a pre-created ACM certificate. If omitted, the ALB controller creates a certificate automatically (requires DNS validation).
For a complete list of available annotations, see the Kubernetes SIG-AWS Load Balancer Controller documentation.
Troubleshootingβ
Connection reset or "premature end of stream" errorsβ
Agents lose connectivity after approximately 60 seconds, and agent logs show connection reset or stream termination errors (see examples in "Understanding the Timeout Setting").
Solution:
- Verify the
idle_timeout.timeout_seconds=4000annotation is present in your ingress configuration - Check ALB attributes in AWS Console (EC2 β Load Balancers β Select load balancer β "Attributes" tab)
- If the timeout is incorrect, update your ingress configuration and reapply it
Health check failures: Target.FailedHealthChecksβ
In the AWS Console, the ALB target group shows targets as unhealthy, and no traffic reaches your Defakto servers.
Solution:
- Verify the
success-codesannotation is set to"0"- gRPC health checks return code 0 for success, not HTTP 200 like REST APIs - Confirm the health check path is
/grpc.health.v1.Health/Check - Verify Defakto server pods are running:
kubectl get pods -n YOUR_TD_DEPLOYMENT_ID - Check pod logs for errors:
kubectl logs -n YOUR_TD_DEPLOYMENT_ID <pod-name>
ALB not created: IngressClass not foundβ
After applying the ingress configuration, the ingress resource shows no ADDRESS after several minutes.
Solution:
- Verify the AWS Load Balancer Controller is installed:
kubectl get deployment -n kube-system aws-load-balancer-controller - Check the
albIngressClass exists:kubectl get ingressclass alb - If either is missing, install the AWS Load Balancer Controller following the AWS installation guide
TLS errors: certificate verify failedβ
Agents or spirldbg report TLS certificate validation errors when connecting to the agent endpoint.
Solution:
- Verify the ACM certificate ARN in the
certificate-arnannotation is correct - Confirm the certificate covers your agent endpoint domain, and check the certificate's Subject Alternative Names (SANs) in ACM
- Ensure the certificate is in the same AWS region as your EKS cluster and ALB
- Check the certificate status is "Issued" (not "Pending Validation") in the ACM console
ALB created but no traffic reaches podsβ
The ALB exists and shows as healthy in AWS Console, but agents cannot connect or spirldbg fails.
Solution:
- Verify security groups allow traffic from the ALB to your EKS worker nodes on port 80 (or your configured service port)
- Check the service exists:
kubectl get service -n YOUR_TD_DEPLOYMENT_ID | grep spirl-server-agent - Verify the service has healthy endpoints (pod IPs):
kubectl get endpoints YOUR_TD_DEPLOYMENT_ID-spirl-server-agent -n YOUR_TD_DEPLOYMENT_ID - Review AWS Load Balancer Controller logs for errors:
kubectl logs -n kube-system deployment/aws-load-balancer-controller - In AWS Console, verify the ALB target groups are targeting the correct server pods running on EKS
See also: