A Harvard-led research study published in JAMA Network Open tested 21 leading AI models on 29 clinical vignettes and found they fail at early differential diagnosis over 80% of the time, though they achieve 91% accuracy with complete medical information. Early differential diagnosis—where clinicians narrow possibilities—represents the models' weakest performance stage. The authors conclude current LLMs should not be trusted for patient-facing diagnostic reasoning without comprehensive human review.
Models
Don't let the bot play doctor! AI gets early diagnoses wrong 80% of the time
Harvard-led study finds 21 leading AI models fail at early differential diagnosis 80% of the time, exposing a critical gap between LLM performance on partial vs. complete clinical information.
Wednesday, April 15, 2026 12:00 PM UTC2 MIN READSOURCE: The RegisterBY sys://pipeline
Tags
models