Haystack v2.30.0 adds PythonCodeSplitter and plain string ChatGenerator support
Syntax-aware Python code splitting and streamlined ChatGenerator input in Haystack's latest release.
deepset has released Haystack v2.30.0-rc1, a pre-release version of the open-source LLM framework, with three major improvements. The star feature is PythonCodeSplitter, a syntax-aware component for splitting Python source files into coherent chunks for code-RAG and code-search. It parses source code using Python's ast module and greedily merges logical units — module docstrings, import blocks, top-level functions, class headers, methods, and nested classes — into chunks of roughly max_effective_lines (default 80). Functions that exceed oversized_factor * max_effective_lines fall back to line-based secondary splitting with overlap. Optional flags like strip_docstrings (moves docstrings to metadata) and preserve_class_definition (prepends the enclosing class signature to child chunks) make downstream retrieval more context-aware. Each chunk carries rich metadata including start_line, end_line, unit_kinds, decorators, and split_id.
Complementing the splitter, all Haystack ChatGenerator components (OpenAI, Azure, HuggingFace, etc.) now accept a plain string for the messages parameter, which is automatically wrapped as a ChatMessage with the user role. This reduces boilerplate when switching from Generator to ChatGenerator. Additionally, DALLEImageGenerator has been updated for OpenAI's retirement of DALL-E models: the default model is now gpt-image-2 (with gpt-image-1 and gpt-image-1-mini also supported), quality values changed to auto/high/medium/low, and size now accepts 1024x1024, 1024x1536, 1536x1024, or auto. The component now always returns base64-encoded JSON, ignoring the response_format parameter. LLM.run and LLM.run_async now require messages and streaming_callback as keyword arguments.
- PythonCodeSplitter uses ast module to keep functions and classes intact in code-RAG chunks, with optional docstring metadata and class signature prepending.
- All 8+ ChatGenerators (OpenAI, Azure, HuggingFace, etc.) now accept plain strings as input, auto-wrapping as ChatMessage.
- DALLEImageGenerator default model changed from dall-e-3 to gpt-image-2; quality and size parameters updated to match OpenAI's new API.
Why It Matters
Improves code understanding in RAG pipelines and simplifies developer workflows for LLM integration.