Research & Papers

DGPO: RL-Steered Graph Diffusion for Neural Architecture Generation

The new method matches benchmark optimums and extrapolates 7.3 percentage points beyond its training ceiling.

Deep Dive

A research team has introduced DGPO (Directed Graph Policy Optimization), a novel method that combines reinforcement learning (RL) with discrete graph diffusion to automate the design of neural network architectures. The core innovation addresses a critical limitation: existing graph diffusion models are designed for undirected structures, discarding the directional information essential for neural architectures, which are directed acyclic graphs (DAGs) where edge direction encodes data flow. DGPO solves this by incorporating topological node ordering and positional encoding, enabling RL to effectively steer the diffusion process toward architectures with desired performance metrics.

Technically, the model was rigorously validated on standard NAS benchmarks. On NAS-Bench-201, DGPO matched the known optimum accuracy on all three image classification tasks: 91.61% on CIFAR-10, 73.49% on CIFAR-100, and 46.77% on ImageNet-16-120. The most striking result is its data efficiency and ability to generalize. The model, pretrained on only 7% of the possible search space, generated architectures that were within 0.32 percentage points of a model trained on the full dataset. Remarkably, it extrapolated 7.3 percentage points beyond the performance ceiling of its own training data, demonstrating learned transferable structural priors. Control experiments confirmed the steering was genuine, with inverse optimization achieving only 9.5% accuracy.

This work, submitted to IJCNN 2026, provides a new, controllable generative framework for combinatorial optimization over directed structures. It moves beyond traditional NAS methods by using a generative model that can be guided by an RL policy to explore the vast space of possible neural network designs efficiently. The implications are significant for automating AI development, potentially reducing the massive computational cost of architecture search by learning general design principles from limited data.

Key Points
  • Matches benchmark optimums on NAS-Bench-201 (91.61%, 73.49%, 46.77%) for CIFAR-10, CIFAR-100, and ImageNet-16-120.
  • Achieves near-oracle performance after pretraining on only 7% of the search space, extrapolating 7.3 percentage points beyond its training data ceiling.
  • Introduces topological ordering to handle directed acyclic graphs (DAGs), a crucial advance over undirected graph diffusion models for neural architecture search.

Why It Matters

Dramatically reduces the data and compute needed for neural architecture search, automating and accelerating the design of efficient AI models.