Nf-PEAK measures per-task energy on Kubernetes with 6.6% error
New tool attributes CPU and DRAM energy to individual workflow tasks with high accuracy.
Scientific workflows executed on shared Kubernetes clusters via engines like Nextflow face a critical challenge: understanding energy consumption at the task level. Node-level counters (e.g., Intel RAPL) provide only aggregate data, access to host process information is often restricted, and concurrent workloads introduce noise. This lack of granularity hampers efforts to optimize for cost and sustainability, as tasks can be highly heterogeneous in resource usage.
Nf-PEAK addresses this by implementing a four-step pipeline: (i) identifies workflow pods, (ii) maps pods to host processes using cgroup metadata, (iii) samples RAPL and per-process performance counters, and (iv) applies a non-linear energy-credit model to attribute CPU-package and DRAM energy at the task level. The method runs entirely within containers, bypassing common access restrictions on shared clusters.
Evaluated on a Kubernetes cluster running three nf-core workflows, Nf-PEAK achieved a Mean Absolute Percentage Error of 6.6% in isolated runs and 10.9% when an unrelated workload saturated 8 of 32 hardware threads per node. Performance remained stable across clusters of 2, 3, 4, and 8 nodes. Compared to the leading Kubernetes energy monitoring tool Kepler, Nf-PEAK yielded significantly lower average error, particularly under co-located load, demonstrating robustness in realistic multi-tenant environments.
This work, accepted at IEEE CLOUD 2026, provides a practical solution for researchers and engineers to pinpoint energy-hungry tasks in their pipelines. By enabling process-level attribution, Nf-PEAK opens the door to targeted optimizations—like rescheduling or reengineering specific tasks—to reduce both energy costs and carbon footprint, without requiring cluster-wide changes.
- Nf-PEAK maps Kubernetes pods to host processes via cgroup metadata to attribute energy per task.
- Achieves MAPE of 6.6% isolated and 10.9% under co-located CPU load, stable from 2 to 8 nodes.
- Outperforms Kepler, especially under resource contention, with lower average error across nf-core workflows.
Why It Matters
Enables precise per-task energy attribution on Kubernetes, key for reducing cost and carbon in scientific HPC workflows.