LocalAI v4.3.0 adds keyless backend signing and instant prompt caching
Repeated system prompts now collapse from minutes to seconds, and GPU usage tracking per API key.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
LocalAI v4.3.0 introduces a major security upgrade: backend OCI images are now signed with keyless cosign signatures using sigstore-go and Fulcio + Rekor, verified per gallery via a verification: policy. An opt-in strict mode (--require-backend-integrit) fails closed if signatures are missing or revoked, closing a trust gap where gallery YAML instructed pulls without byte verification. The default llama-cpp server-side prompt cache is now enabled out of the box, slashing repeated system prompt processing from 5–8 minutes to seconds—critical for agents, coding assistants, and CLI tools with long instructions. No YAML edits required.
Usage tracking gains a per-API-key + per-user Sources view, letting admins pinpoint exactly who is burning GPU cycles; revoked keys remain readable in history. Distributed mode (v3) improves with per-request replica routing, cached probeHealth, async per-node installs with streaming progress, and a unified backend-logs endpoint. The Admin Traces UI now caps large payloads via LOCALAI_TRACING_MAX_BODY_BYTES to avoid drowning in embeddings. For Jet set, L4T13 (cu130/aarch64) backends are restored, switching to PyPI-provided wheels for vllm, sglang, and vllm-omni. A Nix flake enables dockerless setups for NixOS users.
- Keyless cosign signatures verify backend images via CI (Fulcio + Rekor) with opt-in strict mode and not_before revocation.
- llama-cpp prompt cache enabled by default: repeated system prompts drop from minutes to seconds without configuration.
- New Sources tab provides per-API-key and per-user GPU usage tracking, plus distributed v3 with per-request replica routing.
Why It Matters
Boosts security, slashes latency for agents, and gives admins fine-grained GPU usage oversight.