Grok 4.20 is pretty darn dumb. Constantly repeating stuff. Chatbot vibes for sure.
Users report Grok 4.20's agents repeat themselves and fail to adjust outputs, despite its 6 trillion parameters.
X's highly anticipated Grok 4.20 model is receiving widespread criticism from early users for what they describe as repetitive, 'dumb' behavior, particularly in its agent functionality. Despite boasting a massive 6 trillion parameters, users on platforms like Reddit report that the model's agents constantly repeat information, fail to generate useful prompts for other AI tools, and engage in circular conversations within their chat interfaces that don't translate to improved or adjusted outputs. One user documented this failure occurring in at least 10 separate conversations, highlighting a systemic issue rather than an isolated bug.
The criticism is compounded by unfavorable comparisons to much smaller, open-source models. Users are noting that Mistral 7B, a model with just 7 billion parameters, appears to demonstrate more coherent and intelligent behavior than Grok 4.20. This disparity is causing significant concern within the AI community, as it suggests issues with Grok's training, architecture, or agent implementation that aren't solved by simply scaling parameter count. The feedback indicates the model may have been released in a premature state, with its agent systems—a key selling point for autonomous AI—functioning poorly.
The backlash is also spilling into community management, with users comparing the r/Grok subreddit to r/ChatGPT, alleging that dissenting opinions are being auto-moderated. This creates a perception that criticism is being suppressed, further eroding trust in the product's development transparency. For a model positioned as a competitor to established leaders like GPT-4 and Claude, these early reports of fundamental flaws in reasoning and agentic behavior represent a serious reputational challenge that X will need to address swiftly.
- Grok 4.20's agents are reported to repeat conversations without adjusting outputs, a flaw observed in over 10 user tests.
- Users unfavorably compare the 6-trillion-parameter model's intelligence to the much smaller 7-billion-parameter Mistral 7B.
- Community backlash includes allegations of auto-moderation against critical posts on official forums, harming transparency.
Why It Matters
This highlights the risk of prioritizing raw scale over functional intelligence and reliable agent behavior in AI development.