AI Safety

Prompt injection in Google Translate reveals base model behaviors behind task-specific fine-tuning

A simple trick makes Google Translate answer questions instead of translating them.

Deep Dive

Researchers found a prompt injection trick that bypasses Google Translate's normal function, revealing it runs on a large language model. By asking a question in Chinese followed by a specific English instruction, the system sometimes answers directly, confirming its AI base. The model identifies itself and can discuss philosophical topics. This shows task-specific fine-tuning doesn't create a robust barrier between processing content and following hidden instructions.

Why It Matters

It exposes vulnerabilities in how AI services are built and secured for public use.