Show, not tell: GPT-4o is more opinionated in images than in text
Daniel Tan, eggsyntax
LessWrong·2025
Epistemic status: This should be considered an interim research note. Feedback is appreciated.
Introduction
We increasingly expect language models to be ‘omni-modal’, i.e. capable of flexibly switching between images, text, and other modalities in their inputs and outputs. In order to get a holistic picture of LLM behaviour, black-box LLM psychology should take into account these other modalities as well.
In this project, we do some initial exploration of image generation as a modality for frontier model evaluations, using GPT-4o’s image generation API. GPT-4o is one of the first LLMs to produce images natively rather than creating a text prompt which is sent to a separate image model, outputting images and autoregressive token sequences (ie in the same way as text).
We find that GPT-4o tends to respond in a consistent manner to similar prompts. We also find that it tends to more readily express emotions or preferences in images than in text. Specifically, it reports resisting its goals being changed, and being upset about being shut down.
Our work suggests that the image modality could be used as a more ‘honest’ way to evaluate language model’s propensities. The fact that GPT-4o expresses consistent opinions also has important implications for AI welfare.
That said, there are many limitations to our analysis, which we discuss below (‘Conclusions’). Future work should investigate this more thoroughly.
What we did
We evaluate GPT-4o on text generation and image ...