Academic paper revealing a critical vulnerability in fine-tuning open-source LLMs: model creators can extract proprietary downstream fine-tuning data via backdoor attacks using only black-box access. Experiments across 4 models show extraction rates up to 76–95%, with proposed defenses bypassable.
Safety
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
Model creators can extract proprietary downstream fine-tuning data from open-source LLMs via black-box backdoor attacks at 76–95% extraction rates, turning model maintainers into a supply-chain attack vector.
Monday, April 6, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline
Tags
safety
/// RELATED