Robotics

GPU not used in my evaluator?

Participants report NVIDIA L4 GPUs not being utilized during critical competition evaluation phase.

Deep Dive

Participants in Intrinsic's AI for Industry Challenge are reporting a significant technical hurdle: their evaluation containers are failing to utilize available GPU hardware. The issue manifests during Step 3 of the official 'getting_started' guide, where users check GPU utilization via the `nvidia-smi` command only to find 'No running processes found' despite having NVIDIA L4 GPUs with 23GB of memory available. This occurs even after correctly creating the evaluation container using `distrobox create` with the `--nvidia` flag and the official `ghcr.io/intrinsic-dev/aic/aic_eval:latest` image.

The problem appears to be related to the container's runtime environment on Debian GNU/Linux 13 (trixie) systems. Participants have followed the prescribed setup, which includes NVIDIA driver version 595.45.04 and CUDA 13.2, but the evaluator process fails to attach to the GPU. This prevents the execution of GPU-accelerated tasks crucial for the competition's evaluation phase. The issue has sparked discussion in the challenge's forums, with multiple related topics appearing about container startup problems, though no confirmed solution has emerged from the community or organizers as of the reports.

Key Points
  • NVIDIA L4 GPUs show 0% utilization in nvidia-smi during evaluator execution
  • Issue occurs specifically in Step 3 of Intrinsic's AI for Industry Challenge guide
  • Problem persists despite proper container creation with --nvidia flag and official aic_eval image

Why It Matters

This technical blocker prevents participants from properly evaluating their AI models, potentially affecting competition fairness and outcomes.