Using AWS ALB with Defakto Server

This guide shows how to expose your Defakto Trust Domain Servers running on Amazon EKS to agents using an AWS Application Load Balancer (ALB). The ALB provides TLS termination, health checking, and load distribution across your server pods.

You can configure the ALB using kubectl, Terraform, or Helm depending on your infrastructure management approach.

info

This guide assumes you have already deployed Defakto Trust Domain Servers. If not, see Deploy Defakto Trust Domain Servers first.

Prerequisites

Before configuring the ALB, ensure you have:

EKS cluster with worker nodes where Defakto servers will run
AWS Load Balancer Controller installed in your cluster to manage ALB resources (installation guide)
ACM certificate for your agent endpoint domain (e.g., agents.acm.example.com) to enable HTTPS
Active Defakto trust domain and deployment created via spirlctl
spirlctl CLI installed and authenticated to retrieve deployment information

Configuration

Choose your deployment method:

Manual (kubectl)
Helm Chart
Terraform

Create a Kubernetes Ingress resource using kubectl.

info

The Defakto server helm chart supports creating the Ingress as part of the installation. This example is included for reference if not using the Ingress resource from the helm chart.

Replace YOUR_TD_DEPLOYMENT_ID with your deployment ID from spirlctl trust-domain deployment list.

spirl-server-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: spirl-server-agent
  namespace: YOUR_TD_DEPLOYMENT_ID  # e.g., tdd-abc123xyz
  annotations:
    # Load balancer type and exposure
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    
    # gRPC protocol configuration
    alb.ingress.kubernetes.io/backend-protocol: HTTP
    alb.ingress.kubernetes.io/backend-protocol-version: GRPC
    
    # TLS configuration
    # (optional) if ACM certificate is pre-provisioned, point to the specific ARN. Otherwise allow the lb-controller to create the certificate
    # alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:REGION:ACCOUNT:certificate/CERT_ID
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-3-PQ-2025-09
    
    # Health check configuration for gRPC
    alb.ingress.kubernetes.io/healthcheck-path: /grpc.health.v1.Health/Check
    alb.ingress.kubernetes.io/success-codes: "0"
    
    # Load balancing and timeout settings
    alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=4000
spec:
  ingressClassName: alb
  rules:
    - host: agents.acm.example.com  # Your agent endpoint domain name
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: YOUR_TD_DEPLOYMENT_ID-spirl-server-agent
                port:
                  number: 80

Apply the configuration:

kubectl apply -f spirl-server-ingress.yaml

Configure ALB ingress using Helm values:

spirl-server-values.yaml
trustDomainDeployment:
  # Get these values from spirlctl commands
  trustDomainName: "acm.example.com"
  trustDomainID: "td-abc123xyz"
  id: "tdd-def456uvw"
  name: "production"
  
  # Service configuration
  service:
    type: ClusterIP
    port: 80
  
  # ALB Ingress configuration
  ingress:
    enabled: true
    className: alb
    
    annotations:
      # Load balancer type and exposure
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: ip
      
      # gRPC protocol configuration
      alb.ingress.kubernetes.io/backend-protocol: HTTP
      alb.ingress.kubernetes.io/backend-protocol-version: GRPC
      
      # TLS configuration - (optional) uncomment and replace with your certificate ARN
      # alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/abcd-1234
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
      alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-3-PQ-2025-09
      
      # Health check configuration for gRPC
      alb.ingress.kubernetes.io/healthcheck-path: /grpc.health.v1.Health/Check
      alb.ingress.kubernetes.io/success-codes: "0"
      
      # Load balancing and timeout settings
      alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=4000
    
    hosts:
      - host: agents.acm.example.com  # Your agent endpoint domain name
        paths:
          - path: /
            pathType: Prefix

controlPlane:
  auth:
    key:
      id: "tdk-xyz789abc"  # From spirlctl trust-domain key create
      pem: |
        -----BEGIN PRIVATE KEY-----
        YOUR_PRIVATE_KEY_HERE
        -----END PRIVATE KEY-----

telemetry:
  enabled: true
  collectors:
    grpc:
      emitLatencyMetrics: true
  metricsAPI:
    port: 9090

Install or upgrade the Defakto server:

helm upgrade --install YOUR_TD_DEPLOYMENT_ID \
  oci://ghcr.io/spirl/charts/spirl-server \
  --namespace YOUR_TD_DEPLOYMENT_ID \
  --create-namespace \
  --values spirl-server-values.yaml

Deploy Defakto Server with ALB ingress using Terraform:

main.tf
# ============================================================================
# Register the Defakto Server and Deployment
# ============================================================================
resource "spirl_trust_domain" "prod" {
  domain_name = "acm.example.com"
  description = "EKS ALB demo trust domain"
}

# Note: spirl_key_pair is not recommended without careful consideration.
#       spirl_key_pair stores the private key is stored in terraform state files.
resource "spirl_key_pair" "deployment" {
  algorithm = "ed25519"
}

resource "spirl_trust_domain_deployment" "deployment" {
  trust_domain_id = spirl_trust_domain.prod.id
  name            = "eks-alb-demo"
  keys = {
    "key1" = {
      public_key = spirl_key_pair.deployment.public_key_pem
      active     = true
    }
  }
}

# ============================================================================
# Defakto Server Helm Release with ALB Ingress
# ============================================================================

resource "helm_release" "spirl_server" {
  name             = spirl_trust_domain_deployment.deployment.id
  repository       = "oci://ghcr.io/spirl/charts"
  chart            = "spirl-server"
  namespace        = spirl_trust_domain_deployment.deployment.id
  create_namespace = true
  wait             = false

  values = [
    yamlencode({
      trustDomainDeployment = {
        trustDomainName = spirl_trust_domain.prod.domain_name
        trustDomainID   = spirl_trust_domain.prod.id
        id              = spirl_trust_domain_deployment.deployment.id
        name            = spirl_trust_domain_deployment.deployment.name
        
        deployment = {
          replicaCount = 2
        }

        serviceAccount = {
          create = true
        }

        service = {
          type = "ClusterIP"
          port = 80
        }

        # ALB Ingress Configuration
        ingress = {
          enabled   = true
          className = "alb"
          
          annotations = {
            # Load balancer type and exposure
            "alb.ingress.kubernetes.io/scheme"                      = "internet-facing"
            "alb.ingress.kubernetes.io/target-type"                 = "ip"
            
            # gRPC protocol configuration
            "alb.ingress.kubernetes.io/backend-protocol"            = "HTTP"
            "alb.ingress.kubernetes.io/backend-protocol-version"    = "GRPC"
            
            # TLS configuration - (optional) uncomment and add your certificate ARN
            # "alb.ingress.kubernetes.io/certificate-arn"           = aws_acm_certificate.alb_cert.arn
            "alb.ingress.kubernetes.io/listen-ports"                = jsonencode([{ HTTPS = 443 }])
            "alb.ingress.kubernetes.io/ssl-policy"                  = "ELBSecurityPolicy-TLS13-1-3-PQ-2025-09"
            
            # Health check configuration for gRPC
            "alb.ingress.kubernetes.io/success-codes"               = "0"
            "alb.ingress.kubernetes.io/healthcheck-path"            = "/grpc.health.v1.Health/Check"
            
            # Load balancing and timeout settings
            "alb.ingress.kubernetes.io/load-balancer-attributes"    = "idle_timeout.timeout_seconds=4000"
          }

          hosts = [
            {
              host = "agents.acm.example.com"  # Your agent endpoint domain
              paths = [
                {
                  path     = "/"
                  pathType = "Prefix"
                }
              ]
            }
          ]
        }
      }

      controlPlane = {
        auth = {
          key = {
            id  = spirl_trust_domain_deployment.deployment.keys["key1"].id
            pem = spirl_key_pair.deployment.private_key_pem
          }
        }
      }

      telemetry = {
        enabled = true
        collectors = {
          grpc = {
            emitLatencyMetrics = true
          }
        }
        metricsAPI = {
          port = 9090
        }
      }
    })
  ]
}

Apply with Terraform:

terraform init
terraform plan
terraform apply

DNS Setup

After the ALB is created, you need to configure DNS so agents can reach your Defakto servers using a domain name (e.g., agents.acm.example.com).

Get the ALB address:

kubectl get ingress YOUR_TD_DEPLOYMENT_ID-spirl-server-agent -n YOUR_TD_DEPLOYMENT_ID

Look for the ADDRESS column in the output. This will be an AWS hostname like k8s-....elb.amazonaws.com.

Create a DNS record:

Route 53: Create an A record with alias target pointing to the ALB
Other DNS providers: Create a CNAME record pointing to the ALB address

The DNS record should map your agent endpoint domain (e.g., agents.acm.example.com) to the ALB address.

Verification

After completing the configuration, verify that everything is working correctly.

Test connectivity with spirldbg:

Use spirldbg to verify agents can reach the Defakto server through the ALB:

spirldbg network-diagnostics --agent-endpoint agents.acm.example.com:443

Replace agents.acm.example.com with your actual agent endpoint domain. The command performs automatic checks including:

Network connectivity to the endpoint
TLS certificate validation
Defakto server health checks

Verify timeout configuration:

Confirm the idle timeout is configured correctly in the AWS Console:

Navigate to EC2 → Load Balancers
Find your ALB (filter by the deployment ID)
Click the Attributes tab
Verify Idle timeout is set to 4000 seconds

Understanding the Timeout Setting

The 4000-second (66-minute) timeout is important for Defakto's architecture.

How agents connect to the Trust Domain Servers:

Agents maintain persistent HTTP/2 connections to Defakto servers for real-time updates. These connections use bidirectional gRPC streams to handle:

Trust bundle rotation and federation configuration changes
Agent configuration updates

The default timeout problem:

AWS ALB has a default idle timeout of 60 seconds. When no data flows across a connection for 60 seconds, the ALB terminates it. While agents send HTTP/2 ping frames to keep the connection alive, ALB still considers the gRPC streams idle and resets them. This causes agents to constantly reconnect, increasing load on your servers.

If the timeout is too low, you may see errors like:

{
  "level": "error",
  "ts": 1773794743.4153817,
  "logger": "agent.bundleRefresher",
  "msg": "Scheduling bundle sync attempt",
  "after": 0.688756554,
  "attempt": 0,
  "error": "receiving SyncTrustBundles response: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: INTERNAL_ERROR"
}
{
  "level": "error",
  "ts": 1773794743.414761,
  "logger": "agent.sessionClient",
  "msg": "finished call",
  "spirl_server_version": "0.33.0",
  "grpc.component": "client",
  "grpc.service": "com.spirl.private.common.api.resource.v1.Source",
  "grpc.method": "PullResources",
  "grpc.start_time": "2026-03-18T00:44:13Z",
  "grpc.code": "Internal",
  "grpc.error": "rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR",
  "grpc.time_ms": "90366.78"
}

The solution:

Setting idle_timeout.timeout_seconds=4000 (66 minutes) allows connections to remain open longer than the agent's default 30-minute maximum connection lifetime. This lets agents reconnect naturally to optimize for the closest region.

ALB Annotation Reference

These annotations configure the ALB for Defakto servers. Copy these into your ingress annotations section.

Load Balancer Timeout (Required)

alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=4000

Sets the 66-minute idle timeout for agent connections. This is the most important setting - see "Understanding the Timeout Setting" for details.

gRPC Protocol Support (Required)

alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/backend-protocol-version: GRPC

Enables HTTP/2 and gRPC support on the ALB.

Health Checks (Required)

alb.ingress.kubernetes.io/healthcheck-path: /grpc.health.v1.Health/Check
alb.ingress.kubernetes.io/success-codes: "0"

Configures gRPC health checks. Note that gRPC health checks return code 0 for success, unlike HTTP REST APIs which use 200.

Target Configuration (Required)

alb.ingress.kubernetes.io/target-type: ip

TLS/SSL Policy (Recommended)

alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-3-PQ-2025-09
# Directs traffic to pod IPs in EKS. Required for proper routing to Kubernetes pods.
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'

Load Balancer Accessibility (Optional)

alb.ingress.kubernetes.io/scheme: internet-facing  # or 'internal'

Controls whether the ALB is publicly accessible (internet-facing) or private network only (internal).

TLS Certificate (Optional)

alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:REGION:ACCOUNT:certificate/CERT_ID

Provide a pre-created ACM certificate. If omitted, the ALB controller creates a certificate automatically (requires DNS validation).

For a complete list of available annotations, see the Kubernetes SIG-AWS Load Balancer Controller documentation.

Troubleshooting

Connection reset or "premature end of stream" errors

Agents lose connectivity after approximately 60 seconds, and agent logs show connection reset or stream termination errors (see examples in "Understanding the Timeout Setting").

Solution:

Verify the idle_timeout.timeout_seconds=4000 annotation is present in your ingress configuration
Check ALB attributes in AWS Console (EC2 → Load Balancers → Select load balancer → "Attributes" tab)
If the timeout is incorrect, update your ingress configuration and reapply it

Health check failures: Target.FailedHealthChecks

In the AWS Console, the ALB target group shows targets as unhealthy, and no traffic reaches your Defakto servers.

Solution:

Verify the success-codes annotation is set to "0" - gRPC health checks return code 0 for success, not HTTP 200 like REST APIs
Confirm the health check path is /grpc.health.v1.Health/Check
Verify Defakto server pods are running: kubectl get pods -n YOUR_TD_DEPLOYMENT_ID
Check pod logs for errors: kubectl logs -n YOUR_TD_DEPLOYMENT_ID <pod-name>

ALB not created: IngressClass not found

After applying the ingress configuration, the ingress resource shows no ADDRESS after several minutes.

Solution:

Verify the AWS Load Balancer Controller is installed: kubectl get deployment -n kube-system aws-load-balancer-controller
Check the alb IngressClass exists: kubectl get ingressclass alb
If either is missing, install the AWS Load Balancer Controller following the AWS installation guide

TLS errors: certificate verify failed

Agents or spirldbg report TLS certificate validation errors when connecting to the agent endpoint.

Solution:

Verify the ACM certificate ARN in the certificate-arn annotation is correct
Confirm the certificate covers your agent endpoint domain, and check the certificate's Subject Alternative Names (SANs) in ACM
Ensure the certificate is in the same AWS region as your EKS cluster and ALB
Check the certificate status is "Issued" (not "Pending Validation") in the ACM console

ALB created but no traffic reaches pods

The ALB exists and shows as healthy in AWS Console, but agents cannot connect or spirldbg fails.

Solution:

Verify security groups allow traffic from the ALB to your EKS worker nodes on port 80 (or your configured service port)
Check the service exists: kubectl get service -n YOUR_TD_DEPLOYMENT_ID | grep spirl-server-agent
Verify the service has healthy endpoints (pod IPs): kubectl get endpoints YOUR_TD_DEPLOYMENT_ID-spirl-server-agent -n YOUR_TD_DEPLOYMENT_ID
Review AWS Load Balancer Controller logs for errors: kubectl logs -n kube-system deployment/aws-load-balancer-controller
In AWS Console, verify the ALB target groups are targeting the correct server pods running on EKS

Prerequisites​

Configuration​

DNS Setup​

Verification​

Understanding the Timeout Setting​

ALB Annotation Reference​

Load Balancer Timeout (Required)​

gRPC Protocol Support (Required)​

Health Checks (Required)​

Target Configuration (Required)​

TLS/SSL Policy (Recommended)​

Load Balancer Accessibility (Optional)​

TLS Certificate (Optional)​

Troubleshooting​

Connection reset or "premature end of stream" errors​

Health check failures: Target.FailedHealthChecks​

ALB not created: IngressClass not found​

TLS errors: certificate verify failed​

ALB created but no traffic reaches pods​

Related Documentation​

Prerequisites

Configuration

DNS Setup

Verification

Understanding the Timeout Setting

ALB Annotation Reference

Load Balancer Timeout (Required)

gRPC Protocol Support (Required)

Health Checks (Required)

Target Configuration (Required)

TLS/SSL Policy (Recommended)

Load Balancer Accessibility (Optional)

TLS Certificate (Optional)

Troubleshooting

Connection reset or "premature end of stream" errors

Health check failures: Target.FailedHealthChecks

ALB not created: IngressClass not found

TLS errors: certificate verify failed

ALB created but no traffic reaches pods

Related Documentation