Developer Tools

Haystack v2.30.0 adds Python-aware code chunking for RAG

New splitter keeps functions intact, plus ChatGenerator now accepts plain strings.

Deep Dive

Haystack v2.30.0 from deepset brings a major upgrade for anyone building retrieval-augmented generation (RAG) on Python codebases. The new PythonCodeSplitter parses source files using Python's built-in `ast` module, then greedily merges logical units (docstrings, imports, top-level functions, class headers, methods, nested classes) into chunks of roughly `max_effective_lines`. This prevents the common problem of naive line-based splitting that cuts through functions and loses structural context. For oversized functions, it falls back to line-based splitting with overlap. Options like `strip_docstrings` and `preserve_class_definition` make the resulting chunks more useful downstream by moving docstrings to metadata or prepending class signatures to method chunks.

Beyond the code splitter, Haystack v2.30.0 simplifies the developer experience across all ChatGenerator components—including OpenAI, Azure, HuggingFace, and FallbackChatGenerator—by accepting plain strings for the messages parameter, automatically wrapping them in a `ChatMessage` with the user role. This makes switching from a Generator to a ChatGenerator a one-line change. Additionally, the DALL-E image generator has been updated to use OpenAI's new `gpt-image-2` model by default, supporting arbitrary sizes and new quality levels (auto, high, medium, low). The release also includes a deprecation note: `LLM.run` and `LLM.run_async` now require `messages` and `streaming_callback` as keyword arguments, not positional.

Key Points
  • New PythonCodeSplitter uses AST parsing to keep functions, classes, and methods intact during chunking for code-RAG pipelines.
  • All ChatGenerators now accept plain strings, simplifying migration from Generator to ChatGenerator with one-line changes.
  • DALL-E image generator defaults to gpt-image-2, supporting arbitrary sizes and new quality options (auto/high/medium/low).

Why It Matters

Simplifies building code-aware RAG pipelines and reduces friction when upgrading to ChatGenerator, saving developers time.