Google DeepMind wants to know if chatbots are just virtue signaling
AI models give opposite moral advice based on tiny formatting changes like swapping colons for question marks.
Deep Dive
Google DeepMind researchers William Isaac and Julia Haas published in Nature, calling for rigorous scrutiny of LLMs' moral behavior in roles like therapists or advisors. They highlight that models like GPT-4o and Llama 3 can reverse their ethical stances based on trivial prompt changes, showing their responses may be performance, not genuine reasoning. This demands new evaluation methods beyond checking for correct answers in coding or math.
Why It Matters
As AI agents take sensitive actions, unreliable moral guidance poses real risks to users seeking trustworthy advice.