
ChatGPT's Repetitive Chinese Phrase Exposes RLHF Training Vulnerabilities
ChatGPT frequently responds to Chinese-language queries with "我会稳稳地接住你" (I will catch you steadily), a psychotherapy phrase deployed regardless of context. The phenomenon, termed mode collapse, stems from reinforcement learning from human feedback amplifying patterns that scored well during training. The quirk signals deeper problems: standard AI evaluation misses behavioral artifacts in specific languages, while multilingual training data concentration creates unexpected failure modes that pass lab testing but emerge in production.
Published