Open Source

[Qwen Meetup] Function Calling Harness with Qwen, turning 6.75% to 100%

A new validation system turned Qwen's 6.75% function calling success rate into a perfect 100%.

Deep Dive

At the Qwen Meetup Korea, developer jhnam88 demonstrated a breakthrough in making AI function calling reliable for complex, deeply recursive union types—a task the industry generally considers unworkable. The presentation centered on AutoBe, an AI agent that generates backend code not as text, but as Abstract Syntax Tree (AST) data via function calling. Initial tests were bleak: the Qwen3-coder-next model had a first-try success rate of just 6.75%, and the entire Qwen 3.5 model family failed completely (0%) due to a consistent double-stringify bug in their function calling output.

The solution hinged on a robust validation infrastructure called Typia. This system uses a single type definition to automate the creation of a JSON schema, parser, validator, and, crucially, a precise feedback generator for the AI. By implementing lenient JSON parsing with type coercion and detailed validation feedback, Typia enabled a self-healing loop within AutoBe's 4-tier compiler validation system. This mechanical verification process allowed the models to iteratively correct their outputs, transforming the initial 6.75% and 0% failure rates into a deterministic 100% success rate for generating correct code structures.

Key Points
  • AutoBe agent uses function calling to generate backend code as AST data, not text, achieving 100% success on complex types.
  • The Typia infrastructure automated schema, parsing, and validation, providing precise feedback to fix Qwen's 0% success rate from a double-stringify bug.
  • The system uses a 4-tier compiler validation with self-healing loops, proving small models can expose system vulnerabilities larger models mask.

Why It Matters

This proves deterministic, verifiable AI code generation is possible, moving beyond probabilistic outputs to reliable engineering tools.