Prompt injection in Google Translate reveals base model behaviors behind task-specific fine-tuning
A simple trick makes Google Translate answer questions instead of translating them.
Deep Dive
Researchers found a prompt injection trick that bypasses Google Translate's normal function, revealing it runs on a large language model. By asking a question in Chinese followed by a specific English instruction, the system sometimes answers directly, confirming its AI base. The model identifies itself and can discuss philosophical topics. This shows task-specific fine-tuning doesn't create a robust barrier between processing content and following hidden instructions.
Why It Matters
It exposes vulnerabilities in how AI services are built and secured for public use.