Open Source

[Project] I bypassed NemoClaw's sandbox isolation to run a fully local agent (Nemotron 9B + tool calling) on a single RTX 5090

r/LocalLLaMA March 18, 2026

⚡A developer hacked NVIDIA's enterprise AI sandbox to run a fully local agent with custom tool-calling, bypassing cloud restrictions.

Deep Dive

A developer has successfully reverse-engineered and bypassed NVIDIA's newly launched NemoClaw enterprise sandbox to enable fully local AI agent execution. NVIDIA introduced NemoClaw at GTC as a secure container environment for AI agents, built on OpenShell with k3s, Landlock, and seccomp security layers. By default, the system restricts local networking and expects cloud API connections, but the developer wanted 100% local inference using Nemotron 9B on an RTX 5090 within WSL2.

The technical workaround involved multiple layers of network manipulation. First, host iptables were configured to allow traffic from the Docker bridge to a local vLLM instance on port 8000. Then, a custom Python TCP relay was created in the pod's main namespace to bridge the sandbox's virtual ethernet interface to the Docker bridge. Finally, the developer used nsenter to inject an ACCEPT rule into the sandbox's OUTPUT chain, bypassing the default REJECT policy that blocked local connections.

Beyond network access, the developer also built a custom Gateway to handle tool-calling translation. Nemotron 9B outputs tool calls as XML-style tags (<TOOLCALL>[...]</TOOLCALL>), which needed conversion to OpenAI-compatible tool_calls format. The Gateway intercepts streaming Server-Sent Events responses from vLLM, buffers them, parses the tags, and rewrites them in real-time. This enables the opencode framework inside the sandbox to use Nemotron as a fully autonomous agent capable of executing terminal commands and other actions.

While the solution is currently volatile (WSL2 reboots wipe the iptables modifications), it represents a significant proof-of-concept for running enterprise-grade AI agents entirely locally. The developer plans to release the code on GitHub after cleanup, potentially giving other professionals a blueprint for adapting cloud-oriented enterprise AI systems to private, on-premises deployments where data privacy and latency are critical concerns.

Key Points

Bypassed NVIDIA's NemoClaw sandbox isolation using iptables manipulation and a custom TCP relay to enable local vLLM connections
Built a custom Gateway that translates Nemotron 9B's XML-style tool calls (<TOOLCALL>) to OpenAI-compatible format in real-time
Demonstrates fully local AI agent execution on a single RTX 5090 with no data leaving the machine, challenging cloud-only enterprise AI assumptions

Why It Matters

Shows how enterprise AI systems can be adapted for private, on-device agent workflows, addressing data privacy and latency concerns in regulated industries.

Read Original Article

[Project] I bypassed NemoClaw's sandbox isolation to run a fully local agent (Nemotron 9B + tool calling) on a single RTX 5090

Why It Matters

Stay Ahead in AI