Research study "DIA-HARM" examines how content moderation systems perform across 50 English dialects, revealing systematic disparities in harmful content detection accuracy. The findings expose fairness gaps in current moderation models deployed globally by platforms.
Safety
DIA-HARM: Dialectal Disparities in Harmful Content Detection Across 50 English Dialects
DIA-HARM research reveals content moderation systems exhibit significant accuracy disparities across 50 English dialects, systematically underdetecting harmful content in non-standard language variants.
Wednesday, April 8, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline
Tags
safety