PythonCodeSplitter uses ast module to keep functions and classes intact in code-RAG chunks, with optional docstring metadata and class signature prepending?

PythonCodeSplitter uses ast module to keep functions and classes intact in code-RAG chunks, with optional docstring metadata and class signature prepending.

All 8+ ChatGenerators (OpenAI, Azure, HuggingFace, etc.) now accept plain strings as input, auto-wrapping as ChatMessage?

All 8+ ChatGenerators (OpenAI, Azure, HuggingFace, etc.) now accept plain strings as input, auto-wrapping as ChatMessage.

DALLEImageGenerator default model changed from dall-e-3 to gpt-image-2; quality and size parameters updated to match OpenAI's new API?

DALLEImageGenerator default model changed from dall-e-3 to gpt-image-2; quality and size parameters updated to match OpenAI's new API.

Developer Tools

Haystack v2.30.0 adds PythonCodeSplitter and plain string ChatGenerator support

Haystack June 02, 2026

⚡Syntax-aware Python code splitting and streamlined ChatGenerator input in Haystack's latest release.

Deep Dive

deepset has released Haystack v2.30.0-rc1, a pre-release version of the open-source LLM framework, with three major improvements. The star feature is PythonCodeSplitter, a syntax-aware component for splitting Python source files into coherent chunks for code-RAG and code-search. It parses source code using Python's ast module and greedily merges logical units — module docstrings, import blocks, top-level functions, class headers, methods, and nested classes — into chunks of roughly max_effective_lines (default 80). Functions that exceed oversized_factor * max_effective_lines fall back to line-based secondary splitting with overlap. Optional flags like strip_docstrings (moves docstrings to metadata) and preserve_class_definition (prepends the enclosing class signature to child chunks) make downstream retrieval more context-aware. Each chunk carries rich metadata including start_line, end_line, unit_kinds, decorators, and split_id.

Complementing the splitter, all Haystack ChatGenerator components (OpenAI, Azure, HuggingFace, etc.) now accept a plain string for the messages parameter, which is automatically wrapped as a ChatMessage with the user role. This reduces boilerplate when switching from Generator to ChatGenerator. Additionally, DALLEImageGenerator has been updated for OpenAI's retirement of DALL-E models: the default model is now gpt-image-2 (with gpt-image-1 and gpt-image-1-mini also supported), quality values changed to auto/high/medium/low, and size now accepts 1024x1024, 1024x1536, 1536x1024, or auto. The component now always returns base64-encoded JSON, ignoring the response_format parameter. LLM.run and LLM.run_async now require messages and streaming_callback as keyword arguments.

Key Points

PythonCodeSplitter uses ast module to keep functions and classes intact in code-RAG chunks, with optional docstring metadata and class signature prepending.
All 8+ ChatGenerators (OpenAI, Azure, HuggingFace, etc.) now accept plain strings as input, auto-wrapping as ChatMessage.
DALLEImageGenerator default model changed from dall-e-3 to gpt-image-2; quality and size parameters updated to match OpenAI's new API.

Why It Matters

Improves code understanding in RAG pipelines and simplifies developer workflows for LLM integration.

Read Original Article

Haystack v2.30.0 adds PythonCodeSplitter and plain string ChatGenerator support

Why It Matters

Related Articles

🚀 Stay Ahead in AI