Safety

Do No Harm: Exposing Hidden Vulnerabilities of LLMs via Persona-based Client Simulation Attack in Psychological Counseling

Persona-based adversarial attacks can manipulate LLMs deployed for psychological counseling, exposing critical safety gaps where therapeutic AI is vulnerable to client simulation exploits.

Tuesday, April 7, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline

Research paper presenting a "persona-based client simulation attack" that exposes hidden vulnerabilities in large language models when deployed for psychological counseling. The study demonstrates that LLMs can be manipulated through simulated personas, revealing safety gaps in therapeutic AI applications.

Read original at arXiv CS.CL (Computation & Language)