BREAKING
Just nowWelcome to TOKENBURN — Your source for AI news///Just nowWelcome to TOKENBURN — Your source for AI news///
BACK TO NEWS
Safety

What Is The Political Content in LLMs' Pre- and Post-Training Data?

LLM training data exhibits systematic left-leaning political skew that directly drives model behavior, emerging at the base model stage and persisting through fine-tuning—suggesting bias mitigation requires curation at the data source, not just post-hoc alignment.

Monday, April 6, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline

Researchers analyze political content in LLM training data (both pre- and post-training) and find systematic left-leaning skew with strong correlation between data composition and model behavior. Political biases emerge at the base model stage and persist through fine-tuning, suggesting bias mitigation requires attention to training data curation.

Tags
safety