Skip to main content

Using AWS ALB with Defakto Server

This guide shows how to expose your Defakto Trust Domain Servers running on Amazon EKS to agents using an AWS Application Load Balancer (ALB). The ALB provides TLS termination, health checking, and load distribution across your server pods.

You can configure the ALB using kubectl, Terraform, or Helm depending on your infrastructure management approach.

info

This guide assumes you have already deployed Defakto Trust Domain Servers. If not, see Deploy Defakto Trust Domain Servers first.

Prerequisites​

Before configuring the ALB, ensure you have:

  • EKS cluster with worker nodes where Defakto servers will run
  • AWS Load Balancer Controller installed in your cluster to manage ALB resources (installation guide)
  • ACM certificate for your agent endpoint domain (e.g., agents.acm.example.com) to enable HTTPS
  • Active Defakto trust domain and deployment created via spirlctl
  • spirlctl CLI installed and authenticated to retrieve deployment information

Configuration​

Choose your deployment method:

Create a Kubernetes Ingress resource using kubectl.

info

The Defakto server helm chart supports creating the Ingress as part of the installation. This example is included for reference if not using the Ingress resource from the helm chart.

Replace YOUR_TD_DEPLOYMENT_ID with your deployment ID from spirlctl trust-domain deployment list.

spirl-server-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: spirl-server-agent
namespace: YOUR_TD_DEPLOYMENT_ID # e.g., tdd-abc123xyz
annotations:
# Load balancer type and exposure
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip

# gRPC protocol configuration
alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/backend-protocol-version: GRPC

# TLS configuration
# (optional) if ACM certificate is pre-provisioned, point to the specific ARN. Otherwise allow the lb-controller to create the certificate
# alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:REGION:ACCOUNT:certificate/CERT_ID
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-3-PQ-2025-09

# Health check configuration for gRPC
alb.ingress.kubernetes.io/healthcheck-path: /grpc.health.v1.Health/Check
alb.ingress.kubernetes.io/success-codes: "0"

# Load balancing and timeout settings
alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=4000
spec:
ingressClassName: alb
rules:
- host: agents.acm.example.com # Your agent endpoint domain name
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: YOUR_TD_DEPLOYMENT_ID-spirl-server-agent
port:
number: 80

Apply the configuration:

kubectl apply -f spirl-server-ingress.yaml

DNS Setup​

After the ALB is created, you need to configure DNS so agents can reach your Defakto servers using a domain name (e.g., agents.acm.example.com).

Get the ALB address:

kubectl get ingress YOUR_TD_DEPLOYMENT_ID-spirl-server-agent -n YOUR_TD_DEPLOYMENT_ID

Look for the ADDRESS column in the output. This will be an AWS hostname like k8s-....elb.amazonaws.com.

Create a DNS record:

  • Route 53: Create an A record with alias target pointing to the ALB
  • Other DNS providers: Create a CNAME record pointing to the ALB address

The DNS record should map your agent endpoint domain (e.g., agents.acm.example.com) to the ALB address.

Verification​

After completing the configuration, verify that everything is working correctly.

Test connectivity with spirldbg:

Use spirldbg to verify agents can reach the Defakto server through the ALB:

spirldbg network-diagnostics --agent-endpoint agents.acm.example.com:443

Replace agents.acm.example.com with your actual agent endpoint domain. The command performs automatic checks including:

  • Network connectivity to the endpoint
  • TLS certificate validation
  • Defakto server health checks

Verify timeout configuration:

Confirm the idle timeout is configured correctly in the AWS Console:

  1. Navigate to EC2 β†’ Load Balancers
  2. Find your ALB (filter by the deployment ID)
  3. Click the Attributes tab
  4. Verify Idle timeout is set to 4000 seconds

Understanding the Timeout Setting​

The 4000-second (66-minute) timeout is important for SPIRL's architecture.

How agents connect to the trust domain servers:

Agents maintain persistent HTTP/2 connections to Defakto servers for real-time updates. These connections use bidirectional gRPC streams to handle:

  • Trust bundle rotation and federation configuration changes
  • Agent configuration updates

The default timeout problem:

AWS ALB has a default idle timeout of 60 seconds. When no data flows across a connection for 60 seconds, the ALB terminates it. While agents send HTTP/2 ping frames to keep the connection alive, ALB still considers the gRPC streams idle and resets them. This causes agents to constantly reconnect, increasing load on your servers.

If the timeout is too low, you may see errors like:

{
"level": "error",
"ts": 1773794743.4153817,
"logger": "agent.bundleRefresher",
"msg": "Scheduling bundle sync attempt",
"after": 0.688756554,
"attempt": 0,
"error": "receiving SyncTrustBundles response: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: INTERNAL_ERROR"
}
{
"level": "error",
"ts": 1773794743.414761,
"logger": "agent.sessionClient",
"msg": "finished call",
"spirl_server_version": "0.33.0",
"grpc.component": "client",
"grpc.service": "com.spirl.private.common.api.resource.v1.Source",
"grpc.method": "PullResources",
"grpc.start_time": "2026-03-18T00:44:13Z",
"grpc.code": "Internal",
"grpc.error": "rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR",
"grpc.time_ms": "90366.78"
}

The solution:

Setting idle_timeout.timeout_seconds=4000 (66 minutes) allows connections to remain open longer than the agent's default 30-minute maximum connection lifetime. This lets agents reconnect naturally to optimize for the closest region.

ALB Annotation Reference​

These annotations configure the ALB for Defakto servers. Copy these into your ingress annotations section.

Load Balancer Timeout (Required)​

alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=4000

Sets the 66-minute idle timeout for agent connections. This is the most important setting - see "Understanding the Timeout Setting" for details.

gRPC Protocol Support (Required)​

alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/backend-protocol-version: GRPC

Enables HTTP/2 and gRPC support on the ALB.

Health Checks (Required)​

alb.ingress.kubernetes.io/healthcheck-path: /grpc.health.v1.Health/Check
alb.ingress.kubernetes.io/success-codes: "0"

Configures gRPC health checks. Note that gRPC health checks return code 0 for success, unlike HTTP REST APIs which use 200.

Target Configuration (Required)​

alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-3-PQ-2025-09
# Directs traffic to pod IPs in EKS. Required for proper routing to Kubernetes pods.
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'

Load Balancer Accessibility (Optional)​

alb.ingress.kubernetes.io/scheme: internet-facing  # or 'internal'

Controls whether the ALB is publicly accessible (internet-facing) or private network only (internal).

TLS Certificate (Optional)​

alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:REGION:ACCOUNT:certificate/CERT_ID

Provide a pre-created ACM certificate. If omitted, the ALB controller creates a certificate automatically (requires DNS validation).


For a complete list of available annotations, see the Kubernetes SIG-AWS Load Balancer Controller documentation.

Troubleshooting​

Connection reset or "premature end of stream" errors​

Agents lose connectivity after approximately 60 seconds, and agent logs show connection reset or stream termination errors (see examples in "Understanding the Timeout Setting").

Solution:

  1. Verify the idle_timeout.timeout_seconds=4000 annotation is present in your ingress configuration
  2. Check ALB attributes in AWS Console (EC2 β†’ Load Balancers β†’ Select load balancer β†’ "Attributes" tab)
  3. If the timeout is incorrect, update your ingress configuration and reapply it

Health check failures: Target.FailedHealthChecks​

In the AWS Console, the ALB target group shows targets as unhealthy, and no traffic reaches your Defakto servers.

Solution:

  1. Verify the success-codes annotation is set to "0" - gRPC health checks return code 0 for success, not HTTP 200 like REST APIs
  2. Confirm the health check path is /grpc.health.v1.Health/Check
  3. Verify Defakto server pods are running: kubectl get pods -n YOUR_TD_DEPLOYMENT_ID
  4. Check pod logs for errors: kubectl logs -n YOUR_TD_DEPLOYMENT_ID <pod-name>

ALB not created: IngressClass not found​

After applying the ingress configuration, the ingress resource shows no ADDRESS after several minutes.

Solution:

  1. Verify the AWS Load Balancer Controller is installed: kubectl get deployment -n kube-system aws-load-balancer-controller
  2. Check the alb IngressClass exists: kubectl get ingressclass alb
  3. If either is missing, install the AWS Load Balancer Controller following the AWS installation guide

TLS errors: certificate verify failed​

Agents or spirldbg report TLS certificate validation errors when connecting to the agent endpoint.

Solution:

  1. Verify the ACM certificate ARN in the certificate-arn annotation is correct
  2. Confirm the certificate covers your agent endpoint domain, and check the certificate's Subject Alternative Names (SANs) in ACM
  3. Ensure the certificate is in the same AWS region as your EKS cluster and ALB
  4. Check the certificate status is "Issued" (not "Pending Validation") in the ACM console

ALB created but no traffic reaches pods​

The ALB exists and shows as healthy in AWS Console, but agents cannot connect or spirldbg fails.

Solution:

  1. Verify security groups allow traffic from the ALB to your EKS worker nodes on port 80 (or your configured service port)
  2. Check the service exists: kubectl get service -n YOUR_TD_DEPLOYMENT_ID | grep spirl-server-agent
  3. Verify the service has healthy endpoints (pod IPs): kubectl get endpoints YOUR_TD_DEPLOYMENT_ID-spirl-server-agent -n YOUR_TD_DEPLOYMENT_ID
  4. Review AWS Load Balancer Controller logs for errors: kubectl logs -n kube-system deployment/aws-load-balancer-controller
  5. In AWS Console, verify the ALB target groups are targeting the correct server pods running on EKS

See also: