[R] I probed 6 open-weight LLMs (7B-9B) for "personality" using hidden states — instruct fine-tuning is associated with measurable behavioral constraints
New research shows your AI assistant has a hidden personality fingerprint you can't change.
A new study probed 6 open-source LLMs (7B-9B parameters) and found each has a measurable, consistent behavioral fingerprint across 7 axes like warm/cold and verbose/concise. Instruct fine-tuning significantly reduces a model's behavioral variability, making it less steerable. Llama 3.1 8B Instruct was the most constrained (60% benchmark pass rate), while DeepSeek LLM 7B Chat was the most independent. The method achieved high reproducibility (ICC > 0.75).
Why It Matters
This reveals a hidden trade-off: fine-tuning for safety and instruction-following may permanently limit an AI's creative flexibility and adaptability.