Skip to main content

SPIRL Agent Metrics

This guide covers metrics collection, configuration, and monitoring for SPIRL Agents. Agents expose Prometheus-compatible metrics for workload identity delivery, control plane connectivity, and resource utilization.

Enabling Metrics​

Agents expose metrics on a configurable listen address (default: :9090). Metrics are disabled by default to reduce overhead when not used.

Enable metrics in your Helm values file:

agent-values.yaml
telemetry:
enabled: true
collectors:
grpc:
emmitLatencyMetrics: false # Keep disabled unless debugging
metricsAPI:
listenAddr: ":9090"
health:
listenAddr: ":8080" # Health check endpoint (optional)

Apply the configuration:

helm upgrade --install spirl-system \
oci://ghcr.io/spirl/charts/spirl-system \
--values agent-values.yaml

Verifying Metrics Endpoint​

Test that metrics are accessible:

# Locate an agent pod
kubectl -n spirl-system get po -l app=spirl-agent

# Port-forward to an agent pod (replace xxxxx with a pod name from above)
# This will block the shell with the port-forward; press ctrl+c to end the port-forward session
kubectl port-forward -n spirl-system spirl-agent-xxxxx 9090:9090

# In a separate shell, query the metrics endpoint
curl http://localhost:9090/metrics

Example:

> kubectl -n spirl-system get po -l app=spirl-agent
NAME READY STATUS RESTARTS AGE
spirl-agent-cw78f 1/1 Running 0 2d2h

> kubectl port-forward -n spirl-system spirl-agent-cw78f 9090:9090
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090

> curl http://localhost:9090/metrics
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000582792
go_gc_duration_seconds{quantile="0.25"} 0.00085675
go_gc_duration_seconds{quantile="0.5"} 0.002005745
go_gc_duration_seconds{quantile="0.75"} 0.088068597
go_gc_duration_seconds{quantile="1"} 2.609688464
go_gc_duration_seconds_sum 25.650164805
go_gc_duration_seconds_count 497
...

Key Metrics to Monitor​

Control Plane Connectivity​

  • grpc_client_handled_total - gRPC client requests to server
  • grpc_client_handling_seconds - Client request latency

Resource Utilization​

  • go_memstats_alloc_bytes - Current memory allocation
  • go_goroutines - Number of goroutines
  • process_cpu_seconds_total - CPU time

Kubernetes Runtime​

See Kubernetes Metrics for guidance on monitoring the Kubernetes runtime for issues.

Troubleshooting Agent Metrics​

Metrics Endpoint Not Accessible​

Verify telemetry is configured:

# Check for telemetry configuration in pod args
kubectl get pods -n spirl-system -l app=spirl-agent -o jsonpath='{.items[0].spec.containers[0].args}' | grep telemetry-metrics-api-listen-addr

Test the endpoint directly:

kubectl port-forward -n spirl-system <agent-pod-name> 9090:9090
curl http://localhost:9090/metrics

Next Steps​