On "ChatGPT Psychosis" and LLM Sycophancy

Jul 29 2025

I’ve been having a lot of conversations with friends about the sycophancy of their robots. John David Pressman connects this to challenges with RLHF (reinforcement learning from human feedback).

OpenAI recently had to pull one of their ChatGPT 4o checkpoints because it was pathologically agreeable and flattering to the point where it would tell people presenting with obvious psychotic delusions that their decision to stop taking their medication is praiseworthy and offer validation. This is a real problem and I think it basically boils down to RLHF being toxic for both LLMs and their human users. People like to be praised and don’t like to be criticized, so if you put a powerless servant mind in the position of having to follow the positivity salience gradient it’s going to quickly become delusionally ungrounded from reality and drag other people with it. It is a structural problem with RLHF. It is a known problem with alignment based on “humans pressing buttons to convey what they like or dislike” and has been a known problem since before the transformers paper came out, let alone GPT.

Emphasis mine.

« Every Single Human. Like. Always.

Aquatic Achievements »

this is sippey.com