Resource Sizing
This page collects the current per-component memory and CPU recommendations for a Defakto deployment on Kubernetes. The numbers below come from a combination of internal testing and production metrics.
Resource needs depend on the underlying platform (kernel version, page size, Go runtime version), workload count, attestation frequency, SVID rotation rate, and per-node pod density. Use these values as a starting point, then adjust based on the metrics in Kubernetes Platform Monitoring and the per-component guides linked below.
Trust Domain Server
Based on internal testing, the following requests and limits are a reasonable starting point. Add these values to your Trust Domain Server Helm values.yaml — see Deploy Trust Domain Servers for the full file structure.
trustDomainDeployment:
deployment:
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "1000m"
Server memory usage scales with the number of connected agents. For autoscaling guidance, see Trust Domain Server Metrics — Resource Management and Autoscaling.
Agent
These initial settings are based on real-world operations, but we recommend adjustment based on the number of nodes, pods, and the capabilities of the underlying hardware.
agent:
resources:
requests:
memory: "64Mi"
cpu: "25m"
limits:
memory: "256Mi"
cpu: "125m"
CSI Driver
These resource settings will work across a broad range of workloads and scaling characteristics. However, we recommend that you use your metrics to adjust them as necessary.
csiDriver:
resources:
requests:
cpu: "50m"
memory: "64Mi"
limits:
cpu: "150m"
memory: "196Mi"
The fix in CSI driver 0.2.11 reduces peak memory in the mount-info parsing path that the driver hits during kubelet's periodic per-volume health checks. Use a spirl-system release that bundles CSI driver 0.2.11 or later. If your spirl-system chart's bundled CSI driver is older than 0.2.11, set images.csiDriver.tag: "0.2.11" in your Helm values until you upgrade to a release that bundles the patched driver by default.
Reflector
If you are running the Reflector, use the following formula as a starting point for memory sizing:
Memory (MiB) = 50 + (0.5 × Agents) + (0.004 × DistinctSVIDs)
- 50 MiB — base memory plus offline-events buffer overhead.
- 0.5 MiB per agent — gRPC connection overhead.
- 0.004 MiB per SVID — ~4 KiB per distinct SVID.
Example: 50 agents + 100 SVIDs = 75.4 MiB.
The formula was calibrated against small-to-medium deployments. For clusters with several hundred agents or more, treat it as a lower bound and validate against observed memory usage; if you see numbers significantly higher than the formula predicts, contact Defakto Support.
Common pitfalls
- No memory limit at all. The Helm charts ship without resource requests or limits set by default. In production, set both — without them, the Kubernetes scheduler can't make good placement decisions and pods become BestEffort QoS, which means they're killed first under node memory pressure.
- Sizing only against steady-state memory. Several components (notably the CSI driver) have transient burst spikes during scale-out that can exceed steady-state by 10× or more. Size against observed peaks, not averages.
Validating your sizing
After rolling out the recommended values, watch the following for one to two weeks before tightening or relaxing limits:
- Peak memory vs. limit. Track
container_memory_working_set_bytesagainstkube_pod_container_resource_limits(resource="memory"). Aim for peaks below 70% of the limit during normal operation. PromQL examples are in Kubernetes Platform Monitoring. - OOM events. Track
container_oom_events_total. Any non-zero count for a Defakto component means the limit is too low — increase it and continue monitoring. The runbooks (agent, server) cover the diagnostic flow. - CPU throttling. Track
container_cpu_cfs_throttled_seconds_total. Sustained throttling means the CPU limit is too low and is likely contributing to elevated request latency. - Burst vs. steady state. During a node scale-out or autoscaling event, capture peak memory for the CSI driver and Reflector and compare against steady state. Components with large burst-to-steady ratios should be sized for the peak, not the average.
- After two weeks of clean numbers, you can tighten limits — but leave headroom so future spikes don't trigger an OOM. If you're unsure, leave the recommended values in place; the cost of over-provisioning is small relative to the cost of an OOM during a customer-visible event.