SPIRL Agent Metrics

This guide covers metrics collection, configuration, and monitoring for SPIRL Agents. Agents expose Prometheus-compatible metrics for workload identity delivery, control plane connectivity, and resource utilization.

Enabling Metrics

Agents expose metrics on a configurable listen address (default: :9090). Metrics are disabled by default to reduce overhead when not used.

Helm Installation
Linux Installation

Enable metrics in your Helm values file:

agent-values.yaml
telemetry:
  enabled: true
  collectors:
    grpc:
      emitLatencyMetrics: false  # Keep disabled unless debugging
  metricsAPI:
    listenAddr: ":9090"
health:
  listenAddr: ":8080"  # Health check endpoint (optional)

Apply the configuration:

helm upgrade --install spirl-system \
  oci://ghcr.io/spirl/charts/spirl-system \
  --values agent-values.yaml

Enable metrics via command-line flags:

--telemetry-metrics-api-listen-addr=":9090" \
--telemetry-enable-grpc-latency-monitoring=false \
--health-listen-addr=":8080"

Or environment variables (dashes replaced with underscores):

TELEMETRY_METRICS_API_LISTEN_ADDR=":9090"
TELEMETRY_ENABLE_GRPC_LATENCY_MONITORING=false
HEALTH_LISTEN_ADDR=":8080"

Or configuration file:

agent-config.yaml
telemetry-metrics-api-listen-addr: ":9090"
telemetry-enable-grpc-latency-monitoring: false
health-listen-addr: ":8080"

Then start the agent with:

--config-file-path=/path/to/agent-config.yaml
# Or via environment variable:
# CONFIG_FILE_PATH=/path/to/agent-config.yaml

Verifying Metrics Endpoint

Helm Installation
Linux Installation

Test that metrics are accessible:

# Locate an agent pod
kubectl -n spirl-system get po -l app=spirl-agent

# Port-forward to an agent pod (replace xxxxx with a pod name from above)
# This will block the shell with the port-forward; press ctrl+c to end the port-forward session
kubectl port-forward -n spirl-system spirl-agent-xxxxx 9090:9090

# In a separate shell, query the metrics endpoint
curl http://localhost:9090/metrics

Example:

> kubectl -n spirl-system get po -l app=spirl-agent
NAME                READY   STATUS    RESTARTS   AGE
spirl-agent-cw78f   1/1     Running   0          2d2h

> kubectl port-forward -n spirl-system spirl-agent-cw78f 9090:9090
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090

> curl http://localhost:9090/metrics
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000582792
go_gc_duration_seconds{quantile="0.25"} 0.00085675
go_gc_duration_seconds{quantile="0.5"} 0.002005745
go_gc_duration_seconds{quantile="0.75"} 0.088068597
go_gc_duration_seconds{quantile="1"} 2.609688464
go_gc_duration_seconds_sum 25.650164805
go_gc_duration_seconds_count 497
...

Test that metrics are accessible locally:

# Query the metrics endpoint directly
curl http://localhost:9090/metrics

If the agent is running on a remote host, you can query it via SSH:

# Query metrics on remote host
ssh user@agent-host 'curl http://localhost:9090/metrics'

Example output:

> curl http://localhost:9090/metrics
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000582792
go_gc_duration_seconds{quantile="0.25"} 0.00085675
go_gc_duration_seconds{quantile="0.5"} 0.002005745
go_gc_duration_seconds{quantile="0.75"} 0.088068597
go_gc_duration_seconds{quantile="1"} 2.609688464
go_gc_duration_seconds_sum 25.650164805
go_gc_duration_seconds_count 497
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 42
...

Network Access

If you configured a different listen address (e.g., 0.0.0.0:9090), you can access metrics remotely. Ensure firewall rules allow access to the metrics port.

Key Metrics to Monitor

Control Plane Connectivity

grpc_client_handled_total - gRPC client requests to server
grpc_client_handling_seconds - Client request latency

Resource Utilization

go_memstats_alloc_bytes - Current memory allocation
go_goroutines - Number of goroutines
process_cpu_seconds_total - CPU time

Kubernetes Runtime

See Kubernetes Metrics for guidance on monitoring the Kubernetes runtime for issues.

Troubleshooting Agent Metrics

Metrics Endpoint Not Accessible

Verify telemetry is configured:

# Check for telemetry configuration in pod args
kubectl get pods -n spirl-system -l app=spirl-agent -o jsonpath='{.items[0].spec.containers[0].args}' | grep telemetry-metrics-api-listen-addr

Test the endpoint directly:

kubectl port-forward -n spirl-system <agent-pod-name> 9090:9090
curl http://localhost:9090/metrics

Next Steps

Server Metrics - Configure metrics for Trust Domain Servers
Review All Metrics - Complete metrics reference

Enabling Metrics​

Verifying Metrics Endpoint​

Key Metrics to Monitor​

Control Plane Connectivity​

Resource Utilization​

Kubernetes Runtime​

Troubleshooting Agent Metrics​

Metrics Endpoint Not Accessible​

Next Steps​