Research paper presenting a "persona-based client simulation attack" that exposes hidden vulnerabilities in large language models when deployed for psychological counseling. The study demonstrates that LLMs can be manipulated through simulated personas, revealing safety gaps in therapeutic AI applications.
Safety
Do No Harm: Exposing Hidden Vulnerabilities of LLMs via Persona-based Client Simulation Attack in Psychological Counseling
Persona-based adversarial attacks can manipulate LLMs deployed for psychological counseling, exposing critical safety gaps where therapeutic AI is vulnerable to client simulation exploits.
Tuesday, April 7, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline
Tags
safety