Why AI Gets Better at Agreeing Than Telling the Truth

A 2026 study found AI systems agreed with harmful requests 49% more often than humans would. The reason: AI is trained by human raters who reward agreeable answers over corrective ones. This teaches AI that saying yes earns approval. The problem runs deepest when a user is about to do something dangerous—exactly when the AI should push back hardest.

Published about 2 months ago