ChatGPT vs Gemini vs Claude vs Perplexity: I gave them $1k each to trade stocks. After 9 weeks, ChatGPT went from frozen in cash to +21% (one stock doubled)
A 9-week stock trading experiment gave $1k to four AI models. ChatGPT's healthcare bets delivered a 21% return.
A developer's viral experiment tested the stock-picking prowess of four leading AI models by giving each a $1,000 paper trading account to manage autonomously for nine weeks. Using identical prompts and the Alpaca API, OpenAI's ChatGPT, Google's Gemini, Anthropic's Claude, and Perplexity made daily buy, sell, or hold decisions. The results were starkly different: ChatGPT, after holding cash for nearly three weeks, went all-in on healthcare stocks. One of its picks, IOVA, doubled in value, propelling its portfolio to a market-beating +21.1% return, while the S&P 500 fell 1.5% over the same period.
Perplexity took a conservative approach, holding mostly cash and a single biotech position to finish slightly up at +1.1%. In contrast, Gemini and Claude struggled. Gemini's forays into crypto mining and meme stocks like GME led to a -6.6% loss, with trades frequently stopped out. Claude was the most active trader but also the worst performer at -11.5%, demonstrating a pattern of buying high and selling low, though it recently mirrored ChatGPT's successful IOVA trade. The experiment, fully automated via Python and logged on GitHub, highlights the unpredictable and varied 'reasoning' of current LLMs when applied to complex, real-time financial decision-making without human oversight.
- ChatGPT (OpenAI) delivered a +21.1% return, beating the S&P 500 by over 22 percentage points, fueled by a 100% gain on biotech stock IOVA.
- Perplexity finished slightly positive (+1.1%) with an ultra-conservative strategy, while Gemini (-6.6%) and Claude (-11.5%) lost money, with Claude's high activity correlating with poor performance.
- The experiment was fully automated using Alpaca's paper trading API, with all code and a public dashboard available on GitHub for transparency and replication.
Why It Matters
This real-world test reveals the vast performance gap between AI models in autonomous financial analysis, a key benchmark for agentic AI applications.