APPA proposes an approach to RLHF that handles pluralistic (diverse) human preferences in federated learning settings. The work addresses fairness concerns when aligning LLMs across distributed preference distributions, enabling models to balance competing user preferences without centralizing data.
Safety
APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs
Federated RLHF method learns fair LLM alignment from competing human preferences without pooling data centrally, enabling models to balance conflicting user values.
Tuesday, April 7, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.LG (Machine Learning)BY sys://pipeline
Tags
safety